Committed, thanks Jeff.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel....@gcc.gnu.org> On Behalf 
Of Jeff Law via Gcc-patches
Sent: Tuesday, June 13, 2023 3:43 AM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@sifive.com; pal...@rivosinc.com; rdapp....@gmail.com
Subject: Re: [PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with 
decompress operation



On 6/12/23 09:11, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong <juzhe.zh...@rivai.ai>
> 
> According to RVV ISA:
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
> 
> We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing 
> vdecompress) Decompress operation.
> 
> Case 1 (nunits = POLY_INT_CST [16, 16]):
> _48 = VEC_PERM_EXPR <_37, _35, { 0, POLY_INT_CST [16, 16], 1, 
> POLY_INT_CST [17, 16], 2, POLY_INT_CST [18, 16], ... }>; We can optimize such 
> VLA SLP permuation pattern into:
> _48 = vdecompress (_37, _35, mask = { 0, 1, 0, 1, ... };
> 
> Case 2 (nunits = POLY_INT_CST [16, 16]):
> _23 = VEC_PERM_EXPR <_46, _44, { POLY_INT_CST [1, 1], POLY_INT_CST [3, 
> 3], POLY_INT_CST [2, 1], POLY_INT_CST [4, 3], POLY_INT_CST [3, 1], 
> POLY_INT_CST [5, 3], ... }>; We can optimize such VLA SLP permuation pattern 
> into:
> _48 = vdecompress (slidedown(_46, 1/2 nunits), slidedown(_44, 1/2 
> nunits), mask = { 0, 1, 0, 1, ... };
> 
> For example:
> void __attribute__ ((noinline, noclone)) vec_slp (uint64_t *restrict 
> a, uint64_t b, uint64_t c, int n) {
>    for (int i = 0; i < n; ++i)
>      {
>        a[i * 2] += b;
>        a[i * 2 + 1] += c;
>      }
> }
> 
> ASM:
> ...
>          vid.v   v0
>          vand.vi v0,v0,1
>          vmseq.vi        v0,v0,1  ===> mask = { 0, 1, 0, 1, ... }
> vdecompress:
>          viota.m v3,v0
>          vrgather.vv     v2,v1,v3,v0.t
> Loop:
>          vsetvli zero,a5,e64,m1,ta,ma
>          vle64.v v1,0(a0)
>          vsetvli a6,zero,e64,m1,ta,ma
>          vadd.vv v1,v2,v1
>          vsetvli zero,a5,e64,m1,ta,ma
>          mv      a5,a3
>          vse64.v v1,0(a0)
>          add     a3,a3,a1
>          add     a0,a0,a2
>          bgtu    a5,a4,.L4
> 
> 
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): New function.
>          (shuffle_decompress_patterns): New function.
>          (expand_vec_perm_const_1): Add decompress optimization.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/autovec/partial/slp-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-9.c: New test.
I've been wanting to get inside expand_vec_perm_const to see what opportunities 
might exist to improve code in there.  We had good success mining this space at 
a prior employer.  While we had a lot of weird idioms and costs to consider it 
was well worth the time.

So quite happy to see you diving into this code.

OK for the trunk,
Jeff

Reply via email to