https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85090

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |itsimbal at gcc dot gnu.org,
                   |                            |kyukhin at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The reason why we do something so weird rather than ix86_expand_vector_set is
that sse.md lacks vec_setv32hi and vec_setv64qi patterns, IMHO it should have
them.

E.g. on:
typedef short V __attribute__((vector_size (64)));

V
foo (V x, int y)
{
  x[0] = y;
  return x;
}

V
bar (V x, int y)
{
  x[7] = y;
  return x;
}

V
baz (V x, int y)
{
  x[11] = y;
  return x;
}

we generate completely terrible code with -O2 -mavx512f -mtune=intel or
-O2 -mavx512bw -mtune=intel.
Moving the word out of the vector, performing masking etc. on the GRPs and then
inserting it again.
clang emits:
        vpinsrw $0, %edi, %xmm0, %xmm2
        vpblendd $15, %ymm2, %ymm0, %ymm0
(and s/$0/$7/) for foo/bar with -O2 -mavx512f, which makes me wonder if the VEX
256-bit vpblendd with 4 arguments really doesn't clear the upper 256 bits,
128-bit vpblendd is documented to clear them, and for baz:
        vextracti128 $1, %ymm0, %xmm2
        vpinsrw $3, %edi, %xmm2, %xmm2
        vinserti128 $1, %xmm2, %ymm0, %ymm0
With -O2 -mavx512bw they emit:
        vpinsrw $0, %edi, %xmm0, %xmm1
        vinserti32x4 $0, %xmm1, %zmm0, %zmm0
(and s/$0/$7/) for foo/bar, though none of those instructions actually require
AVX512BW.  And for baz:
        vextracti128 $1, %ymm0, %xmm1
        vpinsrw $3, %edi, %xmm1, %xmm1
        vinserti32x4 $1, %xmm1, %zmm0, %zmm0

Reply via email to