https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
So for set with T == int and N == 32 we could generate

        vmovd   %edi, %xmm1
        vpbroadcastd    %xmm1, %ymm1
        vpcmpeqd        .LC0(%rip), %ymm1, %ymm2
        vpblendvb       %ymm2, %ymm1, %ymm0, %ymm0
        ret

.LC0:
        .long   0
        .long   1
        .long   2
        .long   3
        .long   4
        .long   5
        .long   6
        .long   7

aka, with GCC generic vectors

V setg (V v, int idx, T val)
{
  V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
  V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
  v = (v & ~mask) | (valv & mask);
  return v;
}


There's ongoing patch iteration on the ml adding variable index vec_set
expanders for powerpc (and the related middle-end changes).  The question
is whether optabs can try many things or the target should have the choice
(probably better).

Eventually there's a more efficient way to generate {0, 1, 2, 3...}.

Reply via email to