On 9/23/20 6:48 AM, LIU Zhiwei wrote: >> + for (i = 0; i < opr_sz_8; i += 2) { >> uint64_t d0, d1; >> - d0 = n[i * 4 + 0] * (uint64_t)m_indexed[i * 4 + 0]; >> + d0 = a[i + 0]; > Add once. >> + d0 += n[i * 4 + 0] * (uint64_t)m_indexed[i * 4 + 0]; >> d0 += n[i * 4 + 1] * (uint64_t)m_indexed[i * 4 + 1]; >> d0 += n[i * 4 + 2] * (uint64_t)m_indexed[i * 4 + 2]; >> d0 += n[i * 4 + 3] * (uint64_t)m_indexed[i * 4 + 3]; >> - d1 = n[i * 4 + 4] * (uint64_t)m_indexed[i * 4 + 0]; >> + >> + d1 = a[i + 1]; >> + d1 += n[i * 4 + 4] * (uint64_t)m_indexed[i * 4 + 0]; >> d1 += n[i * 4 + 5] * (uint64_t)m_indexed[i * 4 + 1]; >> d1 += n[i * 4 + 6] * (uint64_t)m_indexed[i * 4 + 2]; >> d1 += n[i * 4 + 7] * (uint64_t)m_indexed[i * 4 + 3]; >> @@ -555,7 +570,6 @@ void HELPER(gvec_udot_idx_h)(void *vd, void *vn, void >> *vm, uint32_t desc) >> d[i + 0] += d0; > Add twice. > > I think it is wrong here. Do you thinks so?
Yep. Thanks for noticing. r~