https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114440
Bug ID: 114440 Summary: Fail to recognize a chain of lane-reduced operations for loop reduction vect Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: fxue at os dot amperecomputing.com Target Milestone: --- In a loop reduction path containing a lane-reduced operation (DOT_PROD/SAD/WIDEN_SUM), current vectorizer could not handle the pattern if there are other operations, which might be a normal or another lane-reduced one. A pseudo example is represented as: char *d0, *d1; char *s0, *s1; char *w; int *n; ... int sum = 0; for (i) { ... sum += d0[i] * d1[i]; /* DOT_PROD */ ... sum += abs(s0[i] - s1[i]); /* SAD */ ... sum += w[i]; /* WIDEN_SUM */ ... sum += n[i]; /* Normal */ ... } ... = sum; For the case, reduction vectype would vary with operations, and this causes mismatch on count of vectorized defs and uses, a possible means might be fixing that by generating extra trivial pass-through copies. Given a concrete example as: sum = 0; for (i) { sum += d0[i] * d1[i]; /* 16*char -> 4*int */ sum += n[i]; /* 4*int -> 4*int */ } Final vetorized statements could be: sum_v0 = { 0, 0, 0, 0 }; sum_v1 = { 0, 0, 0, 0 }; sum_v2 = { 0, 0, 0, 0 }; sum_v3 = { 0, 0, 0, 0 }; for (i / 16) { sum_v0 += DOT_PROD (v_d0[i: 0 .. 15], v_d1[i: 0 .. 15]); sum_v1 += 0; // copy sum_v2 += 0; // copy sum_v3 += 0; // copy sum_v0 += v_n[i: 0 .. 3]; sum_v1 += v_n[i: 4 .. 7]; sum_v2 += v_n[i: 8 .. 11]; sum_v3 += v_n[i: 12 .. 15]; } sum = REDUC_PLUS(sum_v0 + sum_v1 + sum_v2 + sum_v3); In the above sequence, one summation statement simply forms one pattern. Though, we could easily compose a somewhat more complicated variant that gets into the similar situation. That is, a chain of lane-reduced operations comes from the non-reduction addend in one summation statement, like: sum += d0[i] * d1[i] + abs(s0[i] - s1[i]) + n[i]; Probably, this requires some extension in the vector pattern formation stage to split the patterns.