https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93696

            Bug ID: 93696
           Summary: AVX512VPOPCNTDQ writemask intrinsics produce incorrect
                    results
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---
            Target: i386, x86-64

The writemask (mask) forms of the AVX512VPOPCNTDQ intrinsics generate incorrect
results. When the mask bit is not set, it appears the GCC implementation is
copying from the third parameter, whereas it should be copying from the first
parameter.

testcase:
cat test.c

#include<immintrin.h>
__m128i foo (__m128i dst, __mmask8 m, __m128i src)
{
  return _mm_mask_popcnt_epi64 (dst, m, src);
}

gcc10_trunk -O2 -mavx512vpopcntdq -mavx512vl -S

foo(long long __vector(2), unsigned char, long long __vector(2)):
        kmovw   %edi, %k1
        vpopcntq        %xmm0, %xmm1{%k1}
        vmovdqa64       %xmm1, %xmm0
        ret

which is incorrect, it should be 

foo(__m128i, unsigned char, __m128i):
        kmovw     %edi, %k1                                     #4.10
        vpopcntq  %xmm1, %xmm0{%k1}                             #4.10
        ret                                                     #4.10

Refer to https://godbolt.org/z/EK12b0

Affected intrinsics

_mm256_mask_popcnt_epi64
_mm_mask_popcnt_epi64
_mm256_mask_popcnt_epi32
_mm_mask_popcnt_epi32
_mm512_mask_popcnt_epi32
_mm512_mask_popcnt_epi64
_mm512_mask_popcnt_epi16
_mm256_mask_popcnt_epi16
_mm_mask_popcnt_epi16
_mm512_mask_popcnt_epi8
_mm256_mask_popcnt_epi8
_mm_mask_popcnt_epi8

Reply via email to