https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112327

            Bug ID: 112327
           Summary: RVV: Redundant vmv1r for widen reduction
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

#include "riscv_vector.h"

void rvv_dot_prod(int16_t *pSrcA, int16_t *pSrcB, uint32_t n, int64_t *result)
{
    size_t vl;
    vint16m4_t vSrcA, vSrcB;
    vint64m1_t vSum = __riscv_vmv_s_x_i64m1(0, 1);
    while (n > 0) {
        vl = __riscv_vsetvl_e16m4(n);
        vSrcA = __riscv_vle16_v_i16m4(pSrcA, vl);
        vSrcB = __riscv_vle16_v_i16m4(pSrcB, vl);
        vSum = __riscv_vwredsum_vs_i32m8_i64m1(__riscv_vwmul_vv_i32m8(vSrcA,
vSrcB, vl), vSum, vl);
        pSrcA += vl;
        pSrcB += vl;
        n -= vl;
    }
    *result = __riscv_vmv_x_s_i64m1_i64(vSum);
}

https://godbolt.org/z/sb8G7ExKP

GCC:

...
vmv1r.v v2,v1
...
vwredsum.vs     v1,v8,v2

Clang:

vwredsum.vs     v8, v24, v8

The root cause is that we don't allow vwredsum.vs vd,vs2,vs1, vs1 overlaps vd.

Reply via email to