https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86530

--- Comment #8 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #6)
> With my patch for V4QI, we still don't get the best code:
>   vect_perm_even_271 = VEC_PERM_EXPR <vect__1.7_264, vect__1.8_266, { 0, 2,
> 4, 6 }>;
>   vect_perm_even_273 = VEC_PERM_EXPR <vect__1.9_268, vect__1.10_270, { 0, 2,
> 4, 6 }>;
>   vect_perm_even_275 = VEC_PERM_EXPR <vect_perm_even_271,
> vect_perm_even_273, { 0, 2, 4, 6 }>;
> 
> _275={_264[0], _264[2], _268[0], _268[2]} or
> VEC_PERM<_264, _268, {0, 2, 4, 6}>
> 
> but for some reason we don't reduce it to that perm
> 
> And there is still a lot of extra PERMS than there should be.

Because this loop is not something that can be fixed by using V4QI (we tried
before).

This loop requires improvements to SCEV and SLP. It's loading 16 sequential
bytes as there's no gap between the p1 and p2 values across iterations..

so this loop should vectorized with V16QI and widening additions. So I don't
think this is related to the other example.

So I'll take it back as it requires actual vectorizer work and part of things
we're trying to address in GCC 15.

Reply via email to