https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122746
Bug ID: 122746
Summary: in-order SLP reduction not implemented
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
void foo (float * __restrict sums, float *a, float *b, int n)
{
for (int i = 0; i < n; ++i)
{
sums[0] = sums[0] + a[2*i];
sums[1] = sums[1] + a[2*i+1];
sums[2] = sums[2] + b[2*i];
sums[3] = sums[3] + b[2*i+1];
}
}
can be vectorized with two SLP reduction groups of size two but we'll have
a VF of 2 here which then rules out the non-fast-math in-order reduction
case because of
if (!reduc_chain
&& known_eq (LOOP_VINFO_VECT_FACTOR (loop_vinfo), 1u))
;
which is because vectorize_fold_left_reduction does not implement
reducing the two reductions separately (which can be done with
a pairwise vector reduction type if available and supported).
Using the fold-left-plus IFNs is not possible of course.