https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85090
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |itsimbal at gcc dot gnu.org, | |kyukhin at gcc dot gnu.org --- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> --- The reason why we do something so weird rather than ix86_expand_vector_set is that sse.md lacks vec_setv32hi and vec_setv64qi patterns, IMHO it should have them. E.g. on: typedef short V __attribute__((vector_size (64))); V foo (V x, int y) { x[0] = y; return x; } V bar (V x, int y) { x[7] = y; return x; } V baz (V x, int y) { x[11] = y; return x; } we generate completely terrible code with -O2 -mavx512f -mtune=intel or -O2 -mavx512bw -mtune=intel. Moving the word out of the vector, performing masking etc. on the GRPs and then inserting it again. clang emits: vpinsrw $0, %edi, %xmm0, %xmm2 vpblendd $15, %ymm2, %ymm0, %ymm0 (and s/$0/$7/) for foo/bar with -O2 -mavx512f, which makes me wonder if the VEX 256-bit vpblendd with 4 arguments really doesn't clear the upper 256 bits, 128-bit vpblendd is documented to clear them, and for baz: vextracti128 $1, %ymm0, %xmm2 vpinsrw $3, %edi, %xmm2, %xmm2 vinserti128 $1, %xmm2, %ymm0, %ymm0 With -O2 -mavx512bw they emit: vpinsrw $0, %edi, %xmm0, %xmm1 vinserti32x4 $0, %xmm1, %zmm0, %zmm0 (and s/$0/$7/) for foo/bar, though none of those instructions actually require AVX512BW. And for baz: vextracti128 $1, %ymm0, %xmm1 vpinsrw $3, %edi, %xmm1, %xmm1 vinserti32x4 $1, %xmm1, %zmm0, %zmm0