https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Wojciech Mula from comment #6)
> Hongtao, thank you for your patch and for pinging back! I checked the code
> from this issue against version 11.2.0 (Debian 11.2.0-14), but still, there
> are KMOVQs before performing any bit ops. Here is the output from `gcc -O3
> -march=icelake-server -S`
>
> vpcmpub $0, .LC0(%rip), %zmm0, %k0
> vpcmpub $0, .LC1(%rip), %zmm0, %k1
> vpcmpub $0, .LC2(%rip), %zmm0, %k2
> kmovq %k0, %rcx
> kmovq %k1, %rax
> orq %rcx, %rax
> kmovq %k2, %rdx
> orq %rdx, %rax
> ret
Oh, Yes, Because of pr101185, mask register is slightly disliked. mask bitwise
instructions are generated only if src and dest are both mask registers.
.i.e
#include <immintrin.h>
__m512i
foo_orq (__m512i a, __m512i b, __m512i c, __m512i d)
{
__mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
__mmask64 m2 = _mm512_cmpeq_epi8_mask (c, d);
return _mm512_mask_add_epi8 (c, m1 | m2, a, d);
}