https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114814
--- Comment #3 from Feng Xue <fxue at os dot amperecomputing.com> --- The pattern to match the code belongs to a generic dot-product category, we could consider mapping it to native dot-product instruction with a constant "1" operand. movi v29.16b, 0x1 .L4: ldr q31, [x1], 16 cmeq v31.16b, v28.16b, v31.16b and v31.16b, v29.16b, v31.16b udot v30.4s, v31.16b, v29.16b cmp x5, x1 bne .L4 addv s31, v30.4s fmov w1, s31 And if value accumulation does not require widening, as in this case, then REDUC_PLUS finds its usage, which could be seen as a special instance of dot-product instruction. But here is one point to note: we should think this kind of REDUC_PLUS touches whole vector register, modifying the 1st element and clearing the rest part. Anyway, it would become an addv. For SVE, since element count is variant, element type may not hold accumulation result, only dot-product could be used. Moreover, it is possible to extend the means to handle conditional accumulation as: for (i) { if (cond) sum += a; // => sum += cond * a; }