On 11/1/23 00:56, Juzhe-Zhong wrote:

Consider this following intrinsic code:

void rvv_dot_prod(int16_t *pSrcA, int16_t *pSrcB, uint32_t n, int64_t *result)
{
     size_t vl;
     vint16m4_t vSrcA, vSrcB;
     vint64m1_t vSum = __riscv_vmv_s_x_i64m1(0, 1);
     while (n > 0) {
         vl = __riscv_vsetvl_e16m4(n);
         vSrcA = __riscv_vle16_v_i16m4(pSrcA, vl);
         vSrcB = __riscv_vle16_v_i16m4(pSrcB, vl);
         vSum = __riscv_vwredsum_vs_i32m8_i64m1(__riscv_vwmul_vv_i32m8(vSrcA, 
vSrcB, vl), vSum, vl);
         pSrcA += vl;
         pSrcB += vl;
         n -= vl;
     }
     *result = __riscv_vmv_x_s_i64m1_i64(vSum);
}

https://godbolt.org/z/vWd35W7G6

Before this patch:

...
Loop:
...
vmv1r.v v2,v1
...
vwredsum.vs     v1,v8,v2
...

After this patch:

...
Loop:
...
vwredsum.vs     v1,v8,v1
...

        PR target/112327

gcc/ChangeLog:

        * config/riscv/vector.md: Add '0'.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/pr112327-1.c: New test.
        * gcc.target/riscv/rvv/base/pr112327-2.c: New test.
OK
jeff

Reply via email to