Committed, thanks Jeff. Pan
-----Original Message----- From: Gcc-patches <gcc-patches-bounces+pan2.li=intel....@gcc.gnu.org> On Behalf Of Jeff Law via Gcc-patches Sent: Tuesday, June 13, 2023 3:43 AM To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org Cc: kito.ch...@sifive.com; pal...@rivosinc.com; rdapp....@gmail.com Subject: Re: [PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation On 6/12/23 09:11, juzhe.zh...@rivai.ai wrote: > From: Juzhe-Zhong <juzhe.zh...@rivai.ai> > > According to RVV ISA: > https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc > > We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing > vdecompress) Decompress operation. > > Case 1 (nunits = POLY_INT_CST [16, 16]): > _48 = VEC_PERM_EXPR <_37, _35, { 0, POLY_INT_CST [16, 16], 1, > POLY_INT_CST [17, 16], 2, POLY_INT_CST [18, 16], ... }>; We can optimize such > VLA SLP permuation pattern into: > _48 = vdecompress (_37, _35, mask = { 0, 1, 0, 1, ... }; > > Case 2 (nunits = POLY_INT_CST [16, 16]): > _23 = VEC_PERM_EXPR <_46, _44, { POLY_INT_CST [1, 1], POLY_INT_CST [3, > 3], POLY_INT_CST [2, 1], POLY_INT_CST [4, 3], POLY_INT_CST [3, 1], > POLY_INT_CST [5, 3], ... }>; We can optimize such VLA SLP permuation pattern > into: > _48 = vdecompress (slidedown(_46, 1/2 nunits), slidedown(_44, 1/2 > nunits), mask = { 0, 1, 0, 1, ... }; > > For example: > void __attribute__ ((noinline, noclone)) vec_slp (uint64_t *restrict > a, uint64_t b, uint64_t c, int n) { > for (int i = 0; i < n; ++i) > { > a[i * 2] += b; > a[i * 2 + 1] += c; > } > } > > ASM: > ... > vid.v v0 > vand.vi v0,v0,1 > vmseq.vi v0,v0,1 ===> mask = { 0, 1, 0, 1, ... } > vdecompress: > viota.m v3,v0 > vrgather.vv v2,v1,v3,v0.t > Loop: > vsetvli zero,a5,e64,m1,ta,ma > vle64.v v1,0(a0) > vsetvli a6,zero,e64,m1,ta,ma > vadd.vv v1,v2,v1 > vsetvli zero,a5,e64,m1,ta,ma > mv a5,a3 > vse64.v v1,0(a0) > add a3,a3,a1 > add a0,a0,a2 > bgtu a5,a4,.L4 > > > gcc/ChangeLog: > > * config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): New function. > (shuffle_decompress_patterns): New function. > (expand_vec_perm_const_1): Add decompress optimization. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/partial/slp-8.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp-9.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-8.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-9.c: New test. I've been wanting to get inside expand_vec_perm_const to see what opportunities might exist to improve code in there. We had good success mining this space at a prior employer. While we had a lot of weird idioms and costs to consider it was well worth the time. So quite happy to see you diving into this code. OK for the trunk, Jeff