Consider this following case:
foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addi sp,sp,-128
addi a2,a2,%lo(.LANCHOR0)
mv a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v v8
vs8r.v v8,0(sp) ---> spill
.L3:
vl8re32.v v16,0(sp) ---> reload
vsetvli a4,a1,e8,m2,ta,ma
li a3,0
vsetvli a5,zero,e32,m8,ta,ma
vmv8r.v v0,v16
vmv.v.x v8,a4
vmv.v.i v24,0
vadd.vv v8,v16,v8
vmv8r.v v16,v24
vs8r.v v8,0(sp) ---> spill
.L4:
addiw a3,a3,1
vadd.vv v8,v0,v16
vadd.vi v16,v16,1
vadd.vv v24,v24,v8
bne a0,a3,.L4
vsetvli zero,a4,e32,m8,ta,ma
sub a1,a1,a4
vse32.v v24,0(a2)
slli a4,a4,2
add a2,a2,a4
bne a1,zero,.L3
li a0,0
addi sp,sp,128
jr ra
.L11:
li a0,0
ret
Pick unexpected LMUL = 8.
The root cause is we didn't involve PHI initial value in the dynamic LMUL
calculation:
# j_17 = PHI <j_11(9), 0(5)> ---> # vect_vec_iv_.8_24 = PHI
<_25(9), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0 }(5)>
We didn't count { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } in consuming vector register but it does
allocate an vector register group for it.
This patch fixes this missing count. Then after this patch we pick up perfect
LMUL (LMUL = M4)
foo:
ble a0,zero,.L9
lui a4,%hi(.LANCHOR0)
addi a4,a4,%lo(.LANCHOR0)
mv a2,a0
vsetivli zero,16,e32,m4,ta,ma
vid.v v20
.L3:
vsetvli a3,a2,e8,m1,ta,ma
li a5,0
vsetivli zero,16,e32,m4,ta,ma
vmv4r.v v16,v20
vmv.v.i v12,0
vmv.v.x v4,a3
vmv4r.v v8,v12
vadd.vv v20,v20,v4
.L4:
addiw a5,a5,1
vmv4r.v v4,v8
vadd.vi v8,v8,1
vadd.vv v4,v16,v4
vadd.vv v12,v12,v4
bne a0,a5,.L4
slli a5,a3,2
vsetvli zero,a3,e32,m4,ta,ma
sub a2,a2,a3
vse32.v v12,0(a4)
add a4,a4,a5
bne a2,zero,.L3
.L9:
li a0,0
ret
Tested on --with-arch=gcv no regression. Ok for trunk ?
PR target/113112
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Refine
dump information.
(preferred_new_lmul_p): Make PHI initial value into live regs
calculation.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: New test.