On 9/23/20 6:48 AM, LIU Zhiwei wrote:
>> +    for (i = 0; i < opr_sz_8; i += 2) {
>>           uint64_t d0, d1;
>>   -        d0  = n[i * 4 + 0] * (uint64_t)m_indexed[i * 4 + 0];
>> +        d0  = a[i + 0];
> Add once.
>> +        d0 += n[i * 4 + 0] * (uint64_t)m_indexed[i * 4 + 0];
>>           d0 += n[i * 4 + 1] * (uint64_t)m_indexed[i * 4 + 1];
>>           d0 += n[i * 4 + 2] * (uint64_t)m_indexed[i * 4 + 2];
>>           d0 += n[i * 4 + 3] * (uint64_t)m_indexed[i * 4 + 3];
>> -        d1  = n[i * 4 + 4] * (uint64_t)m_indexed[i * 4 + 0];
>> +
>> +        d1  = a[i + 1];
>> +        d1 += n[i * 4 + 4] * (uint64_t)m_indexed[i * 4 + 0];
>>           d1 += n[i * 4 + 5] * (uint64_t)m_indexed[i * 4 + 1];
>>           d1 += n[i * 4 + 6] * (uint64_t)m_indexed[i * 4 + 2];
>>           d1 += n[i * 4 + 7] * (uint64_t)m_indexed[i * 4 + 3];
>> @@ -555,7 +570,6 @@ void HELPER(gvec_udot_idx_h)(void *vd, void *vn, void
>> *vm, uint32_t desc)
>>           d[i + 0] += d0;
> Add twice.
> 
> I think it is wrong here. Do you thinks so?

Yep.  Thanks for noticing.


r~

Reply via email to