RE: [PATCH] RISC-V: Use merge approach to optimize vector permutation
Addressed the comments in PATCH v2 as below. https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621789.html Pan -Original Message- From: Gcc-patches On Behalf Of Jeff Law via Gcc-patches Sent: Thursday, June 15, 2023 3:11 AM To: Robin Dapp ; juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; pal...@dabbelt.com; pal...@rivosinc.com Subject: Re: [PATCH] RISC-V: Use merge approach to optimize vector permutation On 6/14/23 09:00, Robin Dapp wrote: > Hi Juzhe, > > the general method seems sane and useful (it's not very complicated). > I was just distracted by > >> Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the >> common expression: >> { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } >> >> For this selector, we can use vmsltu + vmerge to optimize the codegen. > > because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe > { 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }. > > Because of the ascending/monotonic? selector structure we can use vmerge > instead of vrgather. > >> +/* Recognize the patterns that we can use merge operation to shuffle the >> + vectors. The value of Each element (index i) in selector can only be >> + either i or nunits + i. >> + >> + E.g. >> + v = VEC_PERM_EXPR (v0, v1, selector), >> + selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } > > Same. > >> + >> + We can transform such pattern into: >> + >> + v = vcond_mask (v0, v1, mask), >> + mask = { 0, 1, 0, 1, 0, 1, ... }. */ >> + >> +static bool >> +shuffle_merge_patterns (struct expand_vec_perm_d *d) >> +{ >> + machine_mode vmode = d->vmode; >> + machine_mode sel_mode = related_int_vector_mode (vmode).require (); >> + int n_patterns = d->perm.encoding ().npatterns (); >> + poly_int64 vec_len = d->perm.length (); >> + >> + for (int i = 0; i < n_patterns; ++i) >> +if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i)) >> + return false; >> + >> + for (int i = n_patterns; i < n_patterns * 2; i++) >> +if (!d->perm.series_p (i, n_patterns, i, n_patterns) >> +&& !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns)) >> + return false; > > Maybe add a comment that we check that the pattern is actually monotonic > or however you prefet to call it? > > I didn't go through all tests in detail but skimmed several. All in all > looks good to me. So I think that means we want a V2 for the comment updates. But I think we can go ahead and consider V2 pre-approved. jeff
Re: [PATCH] RISC-V: Use merge approach to optimize vector permutation
On 6/14/23 09:00, Robin Dapp wrote: Hi Juzhe, the general method seems sane and useful (it's not very complicated). I was just distracted by Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the common expression: { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } For this selector, we can use vmsltu + vmerge to optimize the codegen. because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe { 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }. Because of the ascending/monotonic? selector structure we can use vmerge instead of vrgather. +/* Recognize the patterns that we can use merge operation to shuffle the + vectors. The value of Each element (index i) in selector can only be + either i or nunits + i. + + E.g. + v = VEC_PERM_EXPR (v0, v1, selector), + selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } Same. + + We can transform such pattern into: + + v = vcond_mask (v0, v1, mask), + mask = { 0, 1, 0, 1, 0, 1, ... }. */ + +static bool +shuffle_merge_patterns (struct expand_vec_perm_d *d) +{ + machine_mode vmode = d->vmode; + machine_mode sel_mode = related_int_vector_mode (vmode).require (); + int n_patterns = d->perm.encoding ().npatterns (); + poly_int64 vec_len = d->perm.length (); + + for (int i = 0; i < n_patterns; ++i) +if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i)) + return false; + + for (int i = n_patterns; i < n_patterns * 2; i++) +if (!d->perm.series_p (i, n_patterns, i, n_patterns) + && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns)) + return false; Maybe add a comment that we check that the pattern is actually monotonic or however you prefet to call it? I didn't go through all tests in detail but skimmed several. All in all looks good to me. So I think that means we want a V2 for the comment updates. But I think we can go ahead and consider V2 pre-approved. jeff
Re: [PATCH] RISC-V: Use merge approach to optimize vector permutation
Hi Juzhe, the general method seems sane and useful (it's not very complicated). I was just distracted by > Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the > common expression: > { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } > > For this selector, we can use vmsltu + vmerge to optimize the codegen. because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe { 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }. Because of the ascending/monotonic? selector structure we can use vmerge instead of vrgather. > +/* Recognize the patterns that we can use merge operation to shuffle the > + vectors. The value of Each element (index i) in selector can only be > + either i or nunits + i. > + > + E.g. > + v = VEC_PERM_EXPR (v0, v1, selector), > + selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } Same. > + > + We can transform such pattern into: > + > + v = vcond_mask (v0, v1, mask), > + mask = { 0, 1, 0, 1, 0, 1, ... }. */ > + > +static bool > +shuffle_merge_patterns (struct expand_vec_perm_d *d) > +{ > + machine_mode vmode = d->vmode; > + machine_mode sel_mode = related_int_vector_mode (vmode).require (); > + int n_patterns = d->perm.encoding ().npatterns (); > + poly_int64 vec_len = d->perm.length (); > + > + for (int i = 0; i < n_patterns; ++i) > +if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i)) > + return false; > + > + for (int i = n_patterns; i < n_patterns * 2; i++) > +if (!d->perm.series_p (i, n_patterns, i, n_patterns) > + && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns)) > + return false; Maybe add a comment that we check that the pattern is actually monotonic or however you prefet to call it? I didn't go through all tests in detail but skimmed several. All in all looks good to me. Regards Robin
[PATCH] RISC-V: Use merge approach to optimize vector permutation
From: Juzhe-Zhong This patch is to optimize the permuation case that is suiteable use merge approach. Consider this following case: typedef int8_t vnx16qi __attribute__((vector_size (16))); #define MASK_16 0, 17, 2, 19, 4, 21, 6, 23, 8, 25, 10, 27, 12, 29, 14, 31 void __attribute__ ((noipa)) merge0 (vnx16qi x, vnx16qi y, vnx16qi *out) { vnx16qi v = __builtin_shufflevector ((vnx16qi) x, (vnx16qi) y, MASK_16); *(vnx16qi*)out = v; } The gimple IR: v_3 = VEC_PERM_EXPR ; Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the common expression: { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } For this selector, we can use vmsltu + vmerge to optimize the codegen. Before this patch: merge0: addia5,sp,16 vl1re8.vv3,0(a5) li a5,31 vsetivlizero,16,e8,m1,ta,mu vmv.v.x v2,a5 lui a5,%hi(.LANCHOR0) addia5,a5,%lo(.LANCHOR0) vl1re8.vv1,0(a5) vl1re8.vv4,0(sp) vand.vv v1,v1,v2 vmsgeu.vi v0,v1,16 vrgather.vv v2,v4,v1 vadd.vi v1,v1,-16 vrgather.vv v2,v3,v1,v0.t vs1r.v v2,0(a0) ret After this patch: merge0: addia5,sp,16 vl1re8.vv1,0(a5) lui a5,%hi(.LANCHOR0) addia5,a5,%lo(.LANCHOR0) vsetivlizero,16,e8,m1,ta,ma vl1re8.vv0,0(a5) vl1re8.vv2,0(sp) vmsltu.vi v0,v0,16 vmerge.vvm v1,v1,v2,v0 vs1r.v v1,0(a0) ret The key of this optimization is that: 1. mask = vmsltu (selector, nunits) 2. result = vmerge (op0, op1, mask) gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_merge_patterns): New pattern. (expand_vec_perm_const_1): Add merge optmization. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: New test. --- gcc/config/riscv/riscv-v.cc | 52 + .../riscv/rvv/autovec/vls-vlmax/merge-1.c | 101 + .../riscv/rvv/autovec/vls-vlmax/merge-2.c | 103 + .../riscv/rvv/autovec/vls-vlmax/merge-3.c | 109 + .../riscv/rvv/autovec/vls-vlmax/merge-4.c | 122 ++ .../riscv/rvv/autovec/vls-vlmax/merge-5.c | 76 +++ .../riscv/rvv/autovec/vls-vlmax/merge-6.c | 51 + .../riscv/rvv/autovec/vls-vlmax/merge-7.c | 25 +++ .../riscv/rvv/autovec/vls-vlmax/merge_run-1.c | 119 ++ .../riscv/rvv/autovec/vls-vlmax/merge_run-2.c | 121 ++ .../riscv/rvv/autovec/vls-vlmax/merge_run-3.c | 150 + .../riscv/rvv/autovec/vls-vlmax/merge_run-4.c | 210 ++ .../riscv/rvv/autovec/vls-vlmax/merge_run-5.c | 89 .../riscv/rvv/autovec/vls-vlmax/merge_run-6.c | 59 + .../riscv/rvv/autovec/vls-vlmax/merge_run-7.c | 29 +++ 15 files changed, 1416 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c create mode 100644 g