https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122746

            Bug ID: 122746
           Summary: in-order SLP reduction not implemented
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

void foo (float * __restrict sums, float *a, float *b, int n)
{
  for (int i = 0; i < n; ++i)
    {
      sums[0] = sums[0] + a[2*i];
      sums[1] = sums[1] + a[2*i+1];
      sums[2] = sums[2] + b[2*i];
      sums[3] = sums[3] + b[2*i+1];
    }
}

can be vectorized with two SLP reduction groups of size two but we'll have
a VF of 2 here which then rules out the non-fast-math in-order reduction
case because of

      if (!reduc_chain
          && known_eq (LOOP_VINFO_VECT_FACTOR (loop_vinfo), 1u))
        ;

which is because vectorize_fold_left_reduction does not implement
reducing the two reductions separately (which can be done with
a pairwise vector reduction type if available and supported).
Using the fold-left-plus IFNs is not possible of course.

Reply via email to