https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538
Bug ID: 85538 Summary: kortest for 32 and 64 bit masks incorrectly uses k0 Product: gcc Version: 8.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (`-O2 -march=skylake-avx512`, cf. https://godbolt.org/g/ou3oAZ): #include <x86intrin.h> // bad: bool f8(__m512i x, __m512i y) { register __mmask64 k asm("%rbx") = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ); return _kortestc_mask64_u8(k, k); } bool f16(__m512i x, __m512i y) { auto k = _mm512_cmp_epi16_mask(x, y, _MM_CMPINT_EQ); return _kortestc_mask32_u8(k, k); } // good: bool f32(__m512i x, __m512i y) { auto k = _mm512_cmp_epi32_mask(x, y, _MM_CMPINT_EQ); return _kortestc_mask16_u8(k, k); } bool f64(__m512i x, __m512i y) { auto k = _mm512_cmp_epi64_mask(x, y, _MM_CMPINT_EQ); return _kortestc_mask8_u8(k, k); } The 32-bit and 64-bit masks are correctly assigned to k1 on vpcmp[bw], but subsequently GCC does some heroics to get k1 assigned into k0 (which shouldn't be possible, no?) and then calls `kortest[qd] %k0, %k0`. The f32 and f64 functions show the correct behavior.