https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93172
Bug ID: 93172 Summary: with AVX512 masked mov assigning zero can use {z} Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase (cf. https://godbolt.org/z/DMQf9-): #include <x86intrin.h> // missed optimization: __m512 f(__m512 x, __mmask16 k) { return _mm512_mask_mov_ps(x, _knot_mask16(k), __m512()); } // f should be translated like this: __m512 g(__m512 x, __mmask16 k) { return _mm512_maskz_mov_ps(k, x); } GCC translates f to: vxorps xmm1, xmm1, xmm1 kmovw k1, edi vmovaps zmm0{k1}, zmm1 . It could use: kmovd k0, edi knotw k1, k0 vmovaps zmm0 {k1} {z}, zmm0 like g does. I.e. whenever a constant zero is assigned under a negated write-mask, the {z} variant of vmovaps should be used. Clang even uses {z} for `_mm512_mask_mov_ps(x, k, __m512())` (i.e. without negation of the mask), which is unclear whether that's actually a pessimization: https://godbolt.org/z/Nn4qXz