https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897
--- Comment #12 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #10)
> Looks like this was fixed in GCC 15:
> ```
> foo:
> .LFB7284:
> .cfi_startproc
> vmovd %edi, %xmm2
> vmovdqa32 %zmm1, %zmm4
> kmovw m(%rip), %k1
> vpsrad %xmm2, %zmm0, %zmm4{%k1}
> vmovdqa32 %zmm4, %zmm0
> ret
>
>
> ```
>
> Though for comment #5 we get:
> ```
> foo:
> .LFB7470:
> .cfi_startproc
> vmovdqa64 %zmm0, %zmm3
> vmovd %edi, %xmm2
> vmovdqa32 %zmm1, %zmm0
> kmovw m(%rip), %k1
> vmovdqa32 %zmm1, %zmm4
> vpslld %xmm2, %zmm3, %zmm0{%k1}
> kmovw m(%rip), %k2
> vpsrad %xmm2, %zmm3, %zmm4{%k2}
> vmovdqa32 %zmm0, zzz(%rip)
> vmovdqa32 %zmm4, %zmm0
> ret
> ```
>
>
> Note the extra kmovw.
The extra kmovw is gone if you add -mavx512bw.