[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0

2019-09-11 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538

Matthias Kretz  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #4 from Matthias Kretz  ---
I can't figure out what the issue here was. Because k0 can certainly be written
to and used for kortest. The only restriction is that k0 cannot be used as a
writemask (predicate operand), which is not the case here.

Codegen wrt. mask registers has improved considerably as well.

[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0

2018-04-27 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538

--- Comment #3 from Matthias Kretz  ---
Some more observations:

1. The instruction sequence:

kmovq %k1,-0x8(%rsp)
vmovq -0x8(%rsp),%xmm1
vmovq %xmm1,%rax
kmovq %rax,%k0

   should be a simple `kmovq %k1,%k0` instead.

2. Adding `asm("");` before the compare intrinsic makes the problem go away.

3. Using inline asm in place of the kortest intrinsic shows the same preference
for using the k0 register. Test case:

void bad(__m512i x, __m512i y) {
auto k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ);
asm("kmovq %0,%%rax" :: "k"(k));
}

4. The following test cases still unnecessarily prefers k0, but does it with a
nicer `kmovq %k1,%0`:

auto almost_good(__m512i x, __m512i y) {
auto k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ);
asm("kmovq %0, %0" : "+k"(k));
return k;
}

(cf. https://godbolt.org/g/hZTga4 for 2, 3 and 4)

[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0

2018-04-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-04-26
 CC||jakub at gcc dot gnu.org,
   ||vmakarov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Jakub Jelinek  ---
Seems RA goes wild for some reason.  Doesn't reproduce with
-O2 -march=skylake-avx512 -mtune=generic, but does e.g. with
-O2 -march=skylake-avx512 -mtune=intel.

Only two insns are involved in the f8:
(insn 10 8 12 2 (set (reg:DI 95)
(unspec:DI [
(subreg:V64QI (reg/v:V8DI 93 [ x ]) 0)
(reg:V64QI 22 xmm1 [ y ])
(const_int 0 [0])
] UNSPEC_PCMP)) "include/avx512bwintrin.h":3058 1740
{avx512bw_cmpv64qi3}
 (expr_list:REG_DEAD (reg/v:V8DI 93 [ x ])
(expr_list:REG_DEAD (reg:V64QI 22 xmm1 [ y ])
(nil
(insn 12 10 13 2 (set (reg:CC 17 flags)
(unspec:CC [
(reg:DI 95)
(reg:DI 95)
] UNSPEC_KORTEST)) "include/avx512bwintrin.h":128 1398 {kortestdi}
 (expr_list:REG_DEAD (reg:DI 95)
(nil)))
and IRA seems to think MASK_EVEX_REGS is most beneficial for r95.
That is what actually is used for the first insn, which has Yk constraint and
thus requires k1-k7 regs, but the second insn has k constraint, allows k0-k7
regs and for some reason the k1 value is moved through stack into k0.

Vlad, could you please have a look?

[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0

2018-04-26 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538

--- Comment #1 from Matthias Kretz  ---
Sorry, I was trying to force GCC to use the k1 register and playing with
register asm (which didn't have any effect at all). f8 should actually be (cf.
https://godbolt.org/g/hSkoJV):

bool f8(__m512i x, __m512i y) {
__mmask64 k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ);
return _kortestc_mask64_u8(k, k);
}