Ok. Address comment and V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617821.html
Thanks. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-05-08 17:53 To: juzhe.zh...@rivai.ai CC: gcc-patches Subject: Re: [PATCH] RISC-V: Optimize vsetvli of LCM INSERTED edge for user vsetvli [PR 109743] I am wondering if it is possible to do this on local_eliminate_vsetvl_insn? I feel this is sort of local elimination, so putting them together would be better than handling that in many different places. On Mon, May 8, 2023 at 9:35 AM juzhe.zh...@rivai.ai <juzhe.zh...@rivai.ai> wrote: > > Gentle ping this patch. > > Is this Ok for trunk? Thanks. > > > juzhe.zh...@rivai.ai > > From: juzhe.zhong > Date: 2023-05-06 19:14 > To: gcc-patches > CC: kito.cheng; Juzhe-Zhong > Subject: [PATCH] RISC-V: Optimize vsetvli of LCM INSERTED edge for user > vsetvli [PR 109743] > From: Juzhe-Zhong <juzhe.zh...@rivai.ai> > > This patch is fixing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743. > > This issue happens is because we are currently very conservative in > optimization of user vsetvli. > > Consider this following case: > > bb 1: > vsetvli a5,a4... (demand AVL = a4). > bb 2: > RVV insn use a5 (demand AVL = a5). > > LCM will hoist vsetvl of bb 2 into bb 1. > We don't do AVL propagation for this situation since it's complicated that > we should analyze the code sequence between vsetvli in bb 1 and RVV insn in > bb 2. > They are not necessary the consecutive blocks. > > This patch is doing the optimizations after LCM, we will check and eliminate > the vsetvli > in LCM inserted edge if such vsetvli is redundant. Such approach is much > simplier and safe. > > code: > void > foo2 (int32_t *a, int32_t *b, int n) > { > if (n <= 0) > return; > int i = n; > size_t vl = __riscv_vsetvl_e32m1 (i); > > for (; i >= 0; i--) > { > vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); > __riscv_vse32_v_i32m1 (b, v, vl); > > if (i >= vl) > continue; > > if (i == 0) > return; > > vl = __riscv_vsetvl_e32m1 (i); > } > } > > Before this patch: > foo2: > .LFB2: > .cfi_startproc > ble a2,zero,.L1 > mv a4,a2 > li a3,-1 > vsetvli a5,a2,e32,m1,ta,mu > vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated. > .L5: > vle32.v v1,0(a0) > vse32.v v1,0(a1) > bgeu a4,a5,.L3 > .L10: > beq a2,zero,.L1 > vsetvli a5,a4,e32,m1,ta,mu > addi a4,a4,-1 > vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated. > vle32.v v1,0(a0) > vse32.v v1,0(a1) > addiw a2,a2,-1 > bltu a4,a5,.L10 > .L3: > addiw a2,a2,-1 > addi a4,a4,-1 > bne a2,a3,.L5 > .L1: > ret > > After this patch: > f: > ble a2,zero,.L1 > mv a4,a2 > li a3,-1 > vsetvli a5,a2,e32,m1,ta,ma > .L5: > vle32.v v1,0(a0) > vse32.v v1,0(a1) > bgeu a4,a5,.L3 > .L10: > beq a2,zero,.L1 > vsetvli a5,a4,e32,m1,ta,ma > addi a4,a4,-1 > vle32.v v1,0(a0) > vse32.v v1,0(a1) > addiw a2,a2,-1 > bltu a4,a5,.L10 > .L3: > addiw a2,a2,-1 > addi a4,a4,-1 > bne a2,a3,.L5 > .L1: > ret > > PR target/109743 > > gcc/ChangeLog: > > * config/riscv/riscv-vsetvl.cc (pass_vsetvl::commit_vsetvls): Add > optimization for LCM inserted edge. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: New test. > * gcc.target/riscv/rvv/vsetvl/pr109743-2.c: New test. > * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: New test. > * gcc.target/riscv/rvv/vsetvl/pr109743-4.c: New test. > > --- > gcc/config/riscv/riscv-vsetvl.cc | 42 +++++++++++++++++++ > .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c | 26 ++++++++++++ > .../gcc.target/riscv/rvv/vsetvl/pr109743-2.c | 27 ++++++++++++ > .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c | 28 +++++++++++++ > .../gcc.target/riscv/rvv/vsetvl/pr109743-4.c | 28 +++++++++++++ > 5 files changed, 151 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c > > diff --git a/gcc/config/riscv/riscv-vsetvl.cc > b/gcc/config/riscv/riscv-vsetvl.cc > index f55907a410e..fcee7fdf323 100644 > --- a/gcc/config/riscv/riscv-vsetvl.cc > +++ b/gcc/config/riscv/riscv-vsetvl.cc > @@ -3834,6 +3834,48 @@ pass_vsetvl::commit_vsetvls (void) > const vector_insn_info *require > = m_vector_manager->vector_exprs[i]; > gcc_assert (require->valid_or_dirty_p ()); > + > + /* Here we optimize the VSETVL is hoisted by LCM: > + > + Before LCM: > + bb 1: > + vsetvli a5,a2,e32,m1,ta,mu > + bb 2: > + vsetvli zero,a5,e32,m1,ta,mu > + ... > + > + After LCM: > + bb 1: > + vsetvli a5,a2,e32,m1,ta,mu > + LCM INSERTED: vsetvli zero,a5,e32,m1,ta,mu --> eliminate > + bb 2: > + ... > + */ > + const basic_block pred_cfg_bb = eg->src; > + const auto block_info > + = m_vector_manager->vector_block_infos[pred_cfg_bb->index]; > + const insn_info *pred_insn = block_info.reaching_out.get_insn (); > + if (pred_insn && vsetvl_insn_p (pred_insn->rtl ()) > + && require->get_avl_source () > + && require->get_avl_source ()->insn () > + && require->skip_avl_compatible_p (block_info.reaching_out)) > + { > + vector_insn_info new_info = *require; > + new_info.set_avl_info ( > + block_info.reaching_out.get_avl_info ()); > + new_info > + = block_info.reaching_out.merge (new_info, LOCAL_MERGE); > + change_vsetvl_insn (pred_insn, new_info); > + bitmap_clear_bit (m_vector_manager->vector_insert[ed], i); > + if (dump_file) > + fprintf ( > + dump_file, > + "\nLCM INSERTED edge %d from bb %d to bb %d for VSETVL " > + "expr[%ld] is removed\n", > + ed, eg->src->index, eg->dest->index, i); > + continue; > + } > + > rtl_profile_for_edge (eg); > start_sequence (); > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c > b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c > new file mode 100644 > index 00000000000..f30275c8280 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c > @@ -0,0 +1,26 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize > -fno-schedule-insns -fno-schedule-insns2" } */ > + > +#include "riscv_vector.h" > + > +void f (int32_t * a, int32_t * b, int n) > +{ > + if (n <= 0) > + return; > + int i = n; > + size_t vl = __riscv_vsetvl_e32m1 (i); > + for (; i >= 0; i--) > + { > + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); > + __riscv_vse32_v_i32m1 (b, v, vl); > + > + if (i >= vl) > + continue; > + if (i == 0) > + return; > + vl = __riscv_vsetvl_e32m1 (i); > + } > +} > + > +/* { dg-final { scan-assembler-times > {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target > { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" > no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" > no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts > "-funroll-loops" } } } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c > b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c > new file mode 100644 > index 00000000000..5f6647bb916 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c > @@ -0,0 +1,27 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize > -fno-schedule-insns -fno-schedule-insns2" } */ > + > +#include "riscv_vector.h" > + > +void f (int32_t * a, int32_t * b, int n) > +{ > + if (n <= 0) > + return; > + int i = n; > + size_t vl = __riscv_vsetvl_e8mf4 (i); > + for (; i >= 0; i--) > + { > + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); > + __riscv_vse32_v_i32m1 (b, v, vl); > + > + if (i >= vl) > + continue; > + if (i == 0) > + return; > + vl = __riscv_vsetvl_e32m1 (i); > + } > +} > + > + > +/* { dg-final { scan-assembler-times > {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target > { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" > no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" > no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts > "-funroll-loops" } } } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c > b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c > new file mode 100644 > index 00000000000..5dbc871ed12 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c > @@ -0,0 +1,28 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize > -fno-schedule-insns -fno-schedule-insns2" } */ > + > +#include "riscv_vector.h" > + > +void f (int32_t * a, int32_t * b, int n) > +{ > + if (n <= 0) > + return; > + int i = n; > + size_t vl = __riscv_vsetvl_e8mf2 (i); > + for (; i >= 0; i--) > + { > + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); > + __riscv_vse32_v_i32m1 (b, v, vl); > + > + if (i >= vl) > + continue; > + if (i == 0) > + return; > + vl = __riscv_vsetvl_e32m1 (i); > + } > +} > + > +/* { dg-final { scan-assembler-times > {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e8,\s*mf2,\s*t[au],\s*m[au]} 1 { target > { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" > no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times > {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target > { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" > no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times > {vsetvli\s+zero,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { > no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts > "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" > no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts > "-funroll-loops" } } } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c > b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c > new file mode 100644 > index 00000000000..edd12855f58 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c > @@ -0,0 +1,28 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize > -fno-schedule-insns -fno-schedule-insns2" } */ > + > +#include "riscv_vector.h" > + > +void > +f (int32_t *a, int32_t *b, int n) > +{ > + if (n <= 0) > + return; > + int i = n; > + size_t vl = __riscv_vsetvl_e8mf4 (i); > + for (; i >= 0; i--) > + { > + vint32m1_t v = __riscv_vle32_v_i32m1 (a + i, vl); > + v = __riscv_vle32_v_i32m1_tu (v, a + i + 100, vl); > + __riscv_vse32_v_i32m1 (b + i, v, vl); > + > + if (i >= vl) > + continue; > + if (i == 0) > + return; > + vl = __riscv_vsetvl_e8mf4 (i); > + } > +} > + > +/* { dg-final { scan-assembler-times > {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*tu,\s*m[au]} 2 { target { > no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts > "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" > no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts > "-funroll-loops" } } } } */ > -- > 2.36.3 >