Oh. I see I think I have done wrong here. I should adjust cost for VEC_EXTRACT not VEC_SET.
But it's odd, I didn't see loop vectorizer is scanning scalar_to_vec cost in vect.dump. The vect tree: # a.4_25 = PHI <1(2), _4(11)> # ivtmp_30 = PHI <18(2), ivtmp_20(11)> # vect_vec_iv_.10_137 = PHI <{ 1, 2, 3, ... }(2), vect_vec_iv_.10_137(11)> # ivtmp_149 = PHI <0(2), ivtmp_150(11)> # loop_len_146 = PHI <18(2), _155(11)> vect_patt_28.11_139 = (vector([2048,2048]) unsigned short) vect_vec_iv_.10_137; _22 = (int) a.4_25; vect_patt_26.12_141 = MIN_EXPR <vect_patt_28.11_139, { 15, ... }>; vect_patt_10.13_143 = { 32872, ... } >> vect_patt_26.12_141; _12 = 32872 >> _22; vect_patt_27.14_144 = VIEW_CONVERT_EXPR<vector([2048,2048]) short int>(vect_patt_10.13_143); b_7 = (short int) _12; _4 = a.4_25 + 1; ivtmp_20 = ivtmp_30 - 1; ivtmp_150 = ivtmp_149 + POLY_INT_CST [2048, 2048]; _153 = MIN_EXPR <ivtmp_150, 18>; _154 = 18 - _153; _155 = MIN_EXPR <_154, POLY_INT_CST [2048, 2048]>; if (_155 != 0) goto <bb 11>; [0.00%] else goto <bb 16>; [100.00%] <bb 16> [local count: 118111600]: # vect_patt_27.14_145 = PHI <vect_patt_27.14_144(8)> # loop_len_156 = PHI <loop_len_146(8)> _147 = loop_len_156 + 18446744073709551615; _148 = .VEC_EXTRACT (vect_patt_27.14_145, _147); b_5 = _148; a = 19; _14 = b_5 != 0; _15 = (int) _14; return _15; The vect dump tree only compute cost include vector_stmt and scalar_to_vec. It seems it didn't consider VEC_EXTRACT cost ? juzhe.zh...@rivai.ai From: Richard Biener Date: 2024-01-11 17:18 To: Juzhe-Zhong CC: gcc-patches; kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc Subject: Re: [PATCH] RISC-V: Increase scalar_to_vec_cost from 1 to 3 On Thu, Jan 11, 2024 at 9:24 AM Juzhe-Zhong <juzhe.zh...@rivai.ai> wrote: > > This patch fixes the following inefficient vectorized codes: > > vsetvli a5,zero,e8,mf2,ta,ma > li a2,17 > vid.v v1 > li a4,-32768 > vsetvli zero,zero,e16,m1,ta,ma > addiw a4,a4,104 > vmv.v.i v3,15 > lui a1,%hi(a) > li a0,19 > vsetvli zero,zero,e8,mf2,ta,ma > vadd.vx v1,v1,a2 > sb a0,%lo(a)(a1) > vsetvli zero,zero,e16,m1,ta,ma > vzext.vf2 v2,v1 > vmv.v.x v1,a4 > vminu.vv v2,v2,v3 > vsrl.vv v1,v1,v2 > vslidedown.vi v1,v1,1 > vmv.x.s a0,v1 > snez a0,a0 > ret > > The reason is scalar_to_vec_cost is too low. > > Consider in VEC_SET, we always have a slide + scalar move instruction, > scalar_to_vec_cost = 1 (current cost) is not reasonable. scalar_to_vec is supposed to model a splat of GPR/FPR to a vector register. We probably want to overhaul the cost classes, esp. vec_to_scalar, but of course not now. > I tried to set it as 2 but failed fix this case, that is, I need to > set it as 3 to fix this case. > > No matter scalar move or slide instruction, I believe they are more costly > than normal vector instructions (e.g. vadd.vv). So set it as 3 looks > reasonable > to me. > > After this patch: > > lui a5,%hi(a) > li a4,19 > sb a4,%lo(a)(a5) > li a0,0 > ret > > Tested on both RV32/RV64 no regression, Ok for trunk ? > > PR target/113281 > > gcc/ChangeLog: > > * config/riscv/riscv.cc: Set scalar_to_vec_cost as 3. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/pr113209.c: Adapt test. > * gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c: New test. > > --- > gcc/config/riscv/riscv.cc | 4 ++-- > .../vect/costmodel/riscv/rvv/pr113281-1.c | 18 ++++++++++++++++++ > .../gcc.target/riscv/rvv/autovec/pr113209.c | 2 +- > 3 files changed, 21 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index df9799d9c5e..bcfb3c15a39 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -366,7 +366,7 @@ static const common_vector_cost rvv_vls_vector_cost = { > 1, /* gather_load_cost */ > 1, /* scatter_store_cost */ > 1, /* vec_to_scalar_cost */ > - 1, /* scalar_to_vec_cost */ > + 3, /* scalar_to_vec_cost */ > 1, /* permute_cost */ > 1, /* align_load_cost */ > 1, /* align_store_cost */ > @@ -382,7 +382,7 @@ static const scalable_vector_cost rvv_vla_vector_cost = { > 1, /* gather_load_cost */ > 1, /* scatter_store_cost */ > 1, /* vec_to_scalar_cost */ > - 1, /* scalar_to_vec_cost */ > + 3, /* scalar_to_vec_cost */ > 1, /* permute_cost */ > 1, /* align_load_cost */ > 1, /* align_store_cost */ > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c > b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c > new file mode 100644 > index 00000000000..331cf961a1f > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -ftree-vectorize > -fdump-tree-vect-details" } */ > + > +unsigned char a; > + > +int main() { > + short b = a = 0; > + for (; a != 19; a++) > + if (a) > + b = 32872 >> a; > + > + if (b == 0) > + return 0; > + else > + return 1; > +} > + > +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113209.c > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113209.c > index 081ee369394..70aae151000 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113209.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113209.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */ > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 > -fno-vect-cost-model" } */ > > int b, c, d, f, i, a; > int e[1] = {0}; > -- > 2.36.3 >