RE: [PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

Li, Pan2 via Gcc-patches Mon, 12 Jun 2023 18:29:14 -0700

Committed, thanks Jeff.

Pan


-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel....@gcc.gnu.org> On Behalf 
Of Jeff Law via Gcc-patches
Sent: Tuesday, June 13, 2023 3:43 AM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@sifive.com; pal...@rivosinc.com; rdapp....@gmail.com
Subject: Re: [PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with 
decompress operation



On 6/12/23 09:11, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong <juzhe.zh...@rivai.ai>
> 
> According to RVV ISA:
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
> 
> We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing 
> vdecompress) Decompress operation.
> 
> Case 1 (nunits = POLY_INT_CST [16, 16]):
> _48 = VEC_PERM_EXPR <_37, _35, { 0, POLY_INT_CST [16, 16], 1, 
> POLY_INT_CST [17, 16], 2, POLY_INT_CST [18, 16], ... }>; We can optimize such 
> VLA SLP permuation pattern into:
> _48 = vdecompress (_37, _35, mask = { 0, 1, 0, 1, ... };
> 
> Case 2 (nunits = POLY_INT_CST [16, 16]):
> _23 = VEC_PERM_EXPR <_46, _44, { POLY_INT_CST [1, 1], POLY_INT_CST [3, 
> 3], POLY_INT_CST [2, 1], POLY_INT_CST [4, 3], POLY_INT_CST [3, 1], 
> POLY_INT_CST [5, 3], ... }>; We can optimize such VLA SLP permuation pattern 
> into:
> _48 = vdecompress (slidedown(_46, 1/2 nunits), slidedown(_44, 1/2 
> nunits), mask = { 0, 1, 0, 1, ... };
> 
> For example:
> void __attribute__ ((noinline, noclone)) vec_slp (uint64_t *restrict 
> a, uint64_t b, uint64_t c, int n) {
>    for (int i = 0; i < n; ++i)
>      {
>        a[i * 2] += b;
>        a[i * 2 + 1] += c;
>      }
> }
> 
> ASM:
> ...
>          vid.v   v0
>          vand.vi v0,v0,1
>          vmseq.vi        v0,v0,1  ===> mask = { 0, 1, 0, 1, ... }
> vdecompress:
>          viota.m v3,v0
>          vrgather.vv     v2,v1,v3,v0.t
> Loop:
>          vsetvli zero,a5,e64,m1,ta,ma
>          vle64.v v1,0(a0)
>          vsetvli a6,zero,e64,m1,ta,ma
>          vadd.vv v1,v2,v1
>          vsetvli zero,a5,e64,m1,ta,ma
>          mv      a5,a3
>          vse64.v v1,0(a0)
>          add     a3,a3,a1
>          add     a0,a0,a2
>          bgtu    a5,a4,.L4
> 
> 
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): New function.
>          (shuffle_decompress_patterns): New function.
>          (expand_vec_perm_const_1): Add decompress optimization.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/autovec/partial/slp-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/partial/slp_run-9.c: New test.
I've been wanting to get inside expand_vec_perm_const to see what opportunities 
might exist to improve code in there.  We had good success mining this space at 
a prior employer.  While we had a lot of weird idioms and costs to consider it 
was well worth the time.

So quite happy to see you diving into this code.

OK for the trunk,
Jeff

RE: [PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

Reply via email to