https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114814

--- Comment #3 from Feng Xue <fxue at os dot amperecomputing.com> ---
The pattern to match the code belongs to a generic dot-product category, we
could consider mapping it to native dot-product instruction with a constant "1"
operand.

        movi    v29.16b, 0x1
.L4:
        ldr     q31, [x1], 16
        cmeq    v31.16b, v28.16b, v31.16b
        and     v31.16b, v29.16b, v31.16b
        udot    v30.4s, v31.16b, v29.16b
        cmp     x5, x1
        bne     .L4
        addv    s31, v30.4s
        fmov    w1, s31

And if value accumulation does not require widening, as in this case, then
REDUC_PLUS finds its usage, which could be seen as a special instance of
dot-product instruction. But here is one point to note: we should think this
kind of REDUC_PLUS touches whole vector register, modifying the 1st element and
clearing the rest part. Anyway, it would become an addv.

For SVE, since element count is variant, element type may not hold accumulation
result, only dot-product could be used.

Moreover, it is possible to extend the means to handle conditional accumulation
as:

   for (i) {
     if (cond)
       sum += a;   // => sum += cond * a;
   }

Reply via email to