https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538

            Bug ID: 85538
           Summary: kortest for 32 and 64 bit masks incorrectly uses k0
           Product: gcc
           Version: 8.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kretz at kde dot org
  Target Milestone: ---

Test case (`-O2 -march=skylake-avx512`, cf. https://godbolt.org/g/ou3oAZ):
#include <x86intrin.h>

// bad:
bool f8(__m512i x, __m512i y) {
    register __mmask64 k asm("%rbx") = _mm512_cmp_epi8_mask(x, y,
_MM_CMPINT_EQ);
    return _kortestc_mask64_u8(k, k);
}
bool f16(__m512i x, __m512i y) {
    auto k = _mm512_cmp_epi16_mask(x, y, _MM_CMPINT_EQ);
    return _kortestc_mask32_u8(k, k);
}

// good:
bool f32(__m512i x, __m512i y) {
    auto k = _mm512_cmp_epi32_mask(x, y, _MM_CMPINT_EQ);
    return _kortestc_mask16_u8(k, k);
}
bool f64(__m512i x, __m512i y) {
    auto k = _mm512_cmp_epi64_mask(x, y, _MM_CMPINT_EQ);
    return _kortestc_mask8_u8(k, k);
}

The 32-bit and 64-bit masks are correctly assigned to k1 on vpcmp[bw], but
subsequently GCC does some heroics to get k1 assigned into k0 (which shouldn't
be possible, no?) and then calls `kortest[qd] %k0, %k0`. The f32 and f64
functions show the correct behavior.

Reply via email to