Re: [PATCH] RISC-V: Apply vla vs. vls mode heuristic vector COST model

2023-12-12 Thread Robin Dapp
Given that it's almost verbatim aarch64's implementation and the
general approach appears sensible, LGTM.

Regards
 Robin



[PATCH] RISC-V: Apply vla vs. vls mode heuristic vector COST model

2023-12-12 Thread Juzhe-Zhong
This patch apply vla vs. vls mode heuristic which can fixes the following FAILs:
FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize
scan-assembler-not vset
FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize
scan-assembler-times li\\s+[a-x0-9]+,0\\s+ret 2

The root cause of this FAIL is we failed to pick VLS mode for the vectorization.

Before this patch:

foo2:
addisp,sp,-208
addia2,sp,64
addia5,sp,128
lui a6,%hi(.LANCHOR0)
sd  ra,200(sp)
addia6,a6,%lo(.LANCHOR0)
mv  a0,a2
mv  a1,a5
li  a3,16
mv  a4,sp
vsetivlizero,8,e64,m8,ta,ma
vle64.v v8,0(a6)
vse64.v v8,0(a2)
vse64.v v8,0(a5)
.L4:
vsetvli a5,a3,e32,m1,ta,ma
sllia2,a5,2
vle32.v v2,0(a1)
vle32.v v1,0(a0)
sub a3,a3,a5
vadd.vv v1,v1,v2
vse32.v v1,0(a4)
add a1,a1,a2
add a0,a0,a2
add a4,a4,a2
bne a3,zero,.L4
lw  a4,128(sp)
lw  a5,64(sp)
addwa5,a5,a4
lw  a4,0(sp)
bne a4,a5,.L5
lw  a4,132(sp)
lw  a5,68(sp)
addwa5,a5,a4
lw  a4,4(sp)
bne a4,a5,.L5
lw  a4,136(sp)
lw  a5,72(sp)
addwa5,a5,a4
lw  a4,8(sp)
bne a4,a5,.L5
lw  a4,140(sp)
lw  a5,76(sp)
addwa5,a5,a4
lw  a4,12(sp)
bne a4,a5,.L5
lw  a4,144(sp)
lw  a5,80(sp)
addwa5,a5,a4
lw  a4,16(sp)
bne a4,a5,.L5
lw  a4,148(sp)
lw  a5,84(sp)
addwa5,a5,a4
lw  a4,20(sp)
bne a4,a5,.L5
lw  a4,152(sp)
lw  a5,88(sp)
addwa5,a5,a4
lw  a4,24(sp)
bne a4,a5,.L5
lw  a4,156(sp)
lw  a5,92(sp)
addwa5,a5,a4
lw  a4,28(sp)
bne a4,a5,.L5
lw  a4,160(sp)
lw  a5,96(sp)
addwa5,a5,a4
lw  a4,32(sp)
bne a4,a5,.L5
lw  a4,164(sp)
lw  a5,100(sp)
addwa5,a5,a4
lw  a4,36(sp)
bne a4,a5,.L5
lw  a4,168(sp)
lw  a5,104(sp)
addwa5,a5,a4
lw  a4,40(sp)
bne a4,a5,.L5
lw  a4,172(sp)
lw  a5,108(sp)
addwa5,a5,a4
lw  a4,44(sp)
bne a4,a5,.L5
lw  a4,176(sp)
lw  a5,112(sp)
addwa5,a5,a4
lw  a4,48(sp)
bne a4,a5,.L5
lw  a4,180(sp)
lw  a5,116(sp)
addwa5,a5,a4
lw  a4,52(sp)
bne a4,a5,.L5
lw  a4,184(sp)
lw  a5,120(sp)
addwa5,a5,a4
lw  a4,56(sp)
bne a4,a5,.L5
lw  a4,188(sp)
lw  a5,124(sp)
addwa5,a5,a4
lw  a4,60(sp)
bne a4,a5,.L5
ld  ra,200(sp)
li  a0,0
addisp,sp,208
jr  ra
.L5:
callabort

After this patch:

li  a0,0
ret

The heuristic leverage ARM SVE and fully tested and confirm we have same 
behavior
as ARM SVE GCC and RVV Clang.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::analyze_loop_vinfo): New 
function.
(costs::record_potential_vls_unrolling): Ditto.
(costs::prefer_unrolled_loop): Ditto.
(costs::better_main_loop_than_p): Ditto.
(costs::add_stmt_cost): Ditto.
* config/riscv/riscv-vector-costs.h (enum cost_type_enum): New enum.
* config/riscv/t-riscv: Add new include files.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr111313.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc| 134 +-
 gcc/config/riscv/riscv-vector-costs.h |  43 ++
 gcc/config/riscv/t-ri