This backport is the first of two required for the pr111935 testcase,
already backported to gcc-13, to pass on riscv64-elf and riscv32-elf.
The V_VLS mode iterator, used in the original patch, is not available in
gcc-13, and I thought that would be too much to backport (and maybe so
are these two patches, WDYT?), so I changed it to V, to match the
preexisting gcc-13 pattern. Regstrapped on x86_64-linux-gnu, along with
other backports, and tested manually on riscv64-elf. Ok to install?
From: Lehua Ding <lehua.d...@rivai.ai>
Hi,
This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern
optimize the special case when the scalar operand is zero.
Currently, the broadcast pattern where the scalar operand is a imm
will be converted to vmv.v.i from vmv.s.x and the mask operand will be
converted from 00..01 to 11..11. There are some advantages and
disadvantages before and after the conversion after discussing
with Juzhe offline and we chose not to do this transform.
Before:
Advantages: The vsetvli info required by vmv.s.x has better compatibility
since
vmv.s.x only required SEW and VLEN be zero or one. That mean there
is more opportunities to combine with other vsetlv infos in vsetvl pass.
Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction
will be needed.
After:
Advantages: No need `li rd, imm` instruction since vmv.v.i support imm
operand.
Disadvantages: Like before's advantages. Worse compatibility leads to more
vsetvl instrunctions need.
Consider the bellow C code and asm after autovec.
there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma)
after converted vmv.s.x to vmv.v.i.
```
int foo1(int* restrict a, int* restrict b, int *restrict c, int n) {
int sum = 0;
for (int i = 0; i < n; i++)
sum += a[i] * b[i];
return sum;
}
```
asm (Before):
```
foo1:
ble a3,zero,.L7
vsetvli a2,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L6:
vsetvli a5,a3,e32,m1,tu,ma
slli a4,a5,2
sub a3,a3,a5
vle32.v v2,0(a0)
vle32.v v3,0(a1)
add a0,a0,a4
add a1,a1,a4
vmacc.vv v1,v3,v2
bne a3,zero,.L6
vsetvli a2,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L7:
li a0,0
ret
```
asm (After):
```
foo1:
ble a3,zero,.L4
vsetvli a2,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a3,e32,m1,tu,ma
slli a4,a5,2
sub a3,a3,a5
vle32.v v2,0(a0)
vle32.v v3,0(a1)
add a0,a0,a4
add a1,a1,a4
vmacc.vv v1,v3,v2
bne a3,zero,.L3
vsetivli zero,1,e32,m1,ta,ma
vmv.v.i v2,0
vsetvli a2,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li a0,0
ret
```
Best,
Lehua
Co-Authored-By: Ju-Zhe Zhong <juzhe.zh...@rivai.ai>
gcc/ChangeLog:
* config/riscv/predicates.md (vector_const_0_operand): New.
* config/riscv/vector.md (*pred_broadcast<mode>_zero): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/scalar_move-5.c: Update.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.