[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538 Matthias Kretz changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #4 from Matthias Kretz --- I can't figure out what the issue here was. Because k0 can certainly be written to and used for kortest. The only restriction is that k0 cannot be used as a writemask (predicate operand), which is not the case here. Codegen wrt. mask registers has improved considerably as well.
[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538 --- Comment #3 from Matthias Kretz --- Some more observations: 1. The instruction sequence: kmovq %k1,-0x8(%rsp) vmovq -0x8(%rsp),%xmm1 vmovq %xmm1,%rax kmovq %rax,%k0 should be a simple `kmovq %k1,%k0` instead. 2. Adding `asm("");` before the compare intrinsic makes the problem go away. 3. Using inline asm in place of the kortest intrinsic shows the same preference for using the k0 register. Test case: void bad(__m512i x, __m512i y) { auto k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ); asm("kmovq %0,%%rax" :: "k"(k)); } 4. The following test cases still unnecessarily prefers k0, but does it with a nicer `kmovq %k1,%0`: auto almost_good(__m512i x, __m512i y) { auto k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ); asm("kmovq %0, %0" : "+k"(k)); return k; } (cf. https://godbolt.org/g/hZTga4 for 2, 3 and 4)
[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-04-26 CC||jakub at gcc dot gnu.org, ||vmakarov at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Jakub Jelinek --- Seems RA goes wild for some reason. Doesn't reproduce with -O2 -march=skylake-avx512 -mtune=generic, but does e.g. with -O2 -march=skylake-avx512 -mtune=intel. Only two insns are involved in the f8: (insn 10 8 12 2 (set (reg:DI 95) (unspec:DI [ (subreg:V64QI (reg/v:V8DI 93 [ x ]) 0) (reg:V64QI 22 xmm1 [ y ]) (const_int 0 [0]) ] UNSPEC_PCMP)) "include/avx512bwintrin.h":3058 1740 {avx512bw_cmpv64qi3} (expr_list:REG_DEAD (reg/v:V8DI 93 [ x ]) (expr_list:REG_DEAD (reg:V64QI 22 xmm1 [ y ]) (nil (insn 12 10 13 2 (set (reg:CC 17 flags) (unspec:CC [ (reg:DI 95) (reg:DI 95) ] UNSPEC_KORTEST)) "include/avx512bwintrin.h":128 1398 {kortestdi} (expr_list:REG_DEAD (reg:DI 95) (nil))) and IRA seems to think MASK_EVEX_REGS is most beneficial for r95. That is what actually is used for the first insn, which has Yk constraint and thus requires k1-k7 regs, but the second insn has k constraint, allows k0-k7 regs and for some reason the k1 value is moved through stack into k0. Vlad, could you please have a look?
[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538 --- Comment #1 from Matthias Kretz --- Sorry, I was trying to force GCC to use the k1 register and playing with register asm (which didn't have any effect at all). f8 should actually be (cf. https://godbolt.org/g/hSkoJV): bool f8(__m512i x, __m512i y) { __mmask64 k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ); return _kortestc_mask64_u8(k, k); }