On 12/22/23 02:51, Juzhe-Zhong wrote:
Consider this following case:

foo:
         ble     a0,zero,.L11
         lui     a2,%hi(.LANCHOR0)
         addi    sp,sp,-128
         addi    a2,a2,%lo(.LANCHOR0)
         mv      a1,a0
         vsetvli a6,zero,e32,m8,ta,ma
         vid.v   v8
         vs8r.v  v8,0(sp)                     ---> spill
.L3:
         vl8re32.v       v16,0(sp)            ---> reload
         vsetvli a4,a1,e8,m2,ta,ma
         li      a3,0
         vsetvli a5,zero,e32,m8,ta,ma
         vmv8r.v v0,v16
         vmv.v.x v8,a4
         vmv.v.i v24,0
         vadd.vv v8,v16,v8
         vmv8r.v v16,v24
         vs8r.v  v8,0(sp)                    ---> spill
.L4:
         addiw   a3,a3,1
         vadd.vv v8,v0,v16
         vadd.vi v16,v16,1
         vadd.vv v24,v24,v8
         bne     a0,a3,.L4
         vsetvli zero,a4,e32,m8,ta,ma
         sub     a1,a1,a4
         vse32.v v24,0(a2)
         slli    a4,a4,2
         add     a2,a2,a4
         bne     a1,zero,.L3
         li      a0,0
         addi    sp,sp,128
         jr      ra
.L11:
         li      a0,0
         ret

Pick unexpected LMUL = 8.

The root cause is we didn't involve PHI initial value in the dynamic LMUL 
calculation:

   # j_17 = PHI <j_11(9), 0(5)>                       ---> # vect_vec_iv_.8_24 = PHI 
<_25(9), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0 }(5)>

We didn't count { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } in consuming vector register but it does 
allocate an vector register group for it.
Yup. There's analogues in the scalar space. Depending on the context we might consider the value live on the edge, at the end of e->src or at the start of e->dest.

In the scalar space we commonly have multiple constant values and we try to account for them as best as we can as each distinct constant can result in a constant load. We also try to find pseudos that happen to already have the value we want so that they participate in the coalescing process. I doubt either of these cases are particularly important for vector though.



This patch fixes this missing count. Then after this patch we pick up perfect 
LMUL (LMUL = M4)

foo:
        ble     a0,zero,.L9
        lui     a4,%hi(.LANCHOR0)
        addi    a4,a4,%lo(.LANCHOR0)
        mv      a2,a0
        vsetivli        zero,16,e32,m4,ta,ma
        vid.v   v20
.L3:
        vsetvli a3,a2,e8,m1,ta,ma
        li      a5,0
        vsetivli        zero,16,e32,m4,ta,ma
        vmv4r.v v16,v20
        vmv.v.i v12,0
        vmv.v.x v4,a3
        vmv4r.v v8,v12
        vadd.vv v20,v20,v4
.L4:
        addiw   a5,a5,1
        vmv4r.v v4,v8
        vadd.vi v8,v8,1
        vadd.vv v4,v16,v4
        vadd.vv v12,v12,v4
        bne     a0,a5,.L4
        slli    a5,a3,2
        vsetvli zero,a3,e32,m4,ta,ma
        sub     a2,a2,a3
        vse32.v v12,0(a4)
        add     a4,a4,a5
        bne     a2,zero,.L3
.L9:
        li      a0,0
        ret

Tested on --with-arch=gcv no regression. Ok for trunk ?

        PR target/113112

gcc/ChangeLog:

        * config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Refine 
dump information.
        (preferred_new_lmul_p): Make PHI initial value into live regs 
calculation.

gcc/testsuite/ChangeLog:

        * gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: New test.
OK assuming you've done the necessary regression testing.

jeff

Reply via email to