Hi, Robin.

>> I like the code examples in general but find them hard to read
>> at lengths > 5-10 or so.  Could we condense this a bit?
Ok, Do I need to send V2 ? Or condense the commit log when merged the patch?


>> I'm a bit wary about getting the costs

>> right for combine patterns but we can deal with this later.

No, you don't need to worry about combining extensions and I don't think we 
need cost to adjust extensions combining.

For vmv.v.x + vadd.vv ==> vadd.vx, we can't claim that vadd.vx is better since 
it will increase scalar register pressures.
So, for such combining, I would like take a another approach to combine this 
pattern carefully with accurate register pressure calculation.

However, for this patch.

vext.vf2 + vext.vf2 + vadd ==> vwadd.vv is always better.
I don't think it is possible that using vwadd.vv will be worse. 

Thanks.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-02 15:01
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv 
instruction optimizations
Hi Juzhe,
 
> ...
>        vsetvli zero,t1,e8,m1,ta,ma
>         vle8.v  v1,0(a4)
>         vsetvli t3,zero,e16,m2,ta,ma
>         vsext.vf2       v6,v1
>         vsetvli zero,t1,e8,m1,ta,ma
>         vle8.v  v1,0(a5)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a0,t4
>         vzext.vf2       v4,v1
>         vmul.vv v2,v4,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
>         vle8.v  v1,0(a6)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a1,t4
>         vzext.vf2       v2,v1
>         vmul.vv v4,v2,v4
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v4,0(t0)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a2,t4
>         vmul.vv v2,v2,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
>         add     t0,a3,t4
>         vle8.v  v1,0(a7)
>         vsetvli t3,zero,e16,m2,ta,ma
>         sub     t6,t6,t1
>         vsext.vf2       v2,v1
>         vmul.vv v2,v2,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
> ...
> 
> After this patch:
> ...
>       vsetvli zero,t1,e8,mf2,ta,ma
>         vle8.v  v1,0(a4)
>         vle8.v  v3,0(a5)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a0,t3
>         vwmulsu.vv      v2,v1,v3
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v2,0(t0)
>         vle8.v  v2,0(a6)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a1,t3
>         vwmulu.vv       v4,v3,v2
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v4,0(t0)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a2,t3
>         vwmulsu.vv      v3,v1,v2
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v3,0(t0)
>         add     t0,a3,t3
>         vle8.v  v3,0(a7)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         sub     t4,t4,t1
>         vwmul.vv        v2,v1,v3
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v2,0(t0)
> ...
 
I like the code examples in general but find them hard to read
at lengths > 5-10 or so.  Could we condense this a bit?
 
> +(include "autovec-opt.md")
ACK for this.  We discussed before that not cluttering the regular
autovec.md with combine-targeted patterns too much so I'm in favor
of the separate file.
 
In total looks good to me.  I'm a bit wary about getting the costs
right for combine patterns but we can deal with this later.
 
Regards
Robin
 

Reply via email to