https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117605
Bug ID: 117605
Summary: SLP vectorization fails for negative stride
interleaving
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
Testing with --param vect-force-slp=1 reveals (via
gcc.target/aarch64/sve/strided_load_4.c and a few others) that we do not SLP
vectorize
void foo (unsigned *restrict dest, unsigned *src, int n)
{
for (int i = 0; i < n; ++i)
dest[i] += src[i * -100];
}
note: Detected single element interleaving *_8 step -400
missed: permutation requires at least three vectors _9 = *_8;
the non-SLP path classifies this as VMAT_ELEMENTWISE, SLP as
VMAT_CONTIGUOUS_REVERSE. The non-SLP path never cosiders that for
grouped accesses.
The easiest solution is to extend the existing VMAT_CONTIGUOUS demotion
to VMAT_ELEMENTWISE for large groups to also cover VMAT_CONTIGUOUS_REVERSE.