Hi, Robin. >> I like the code examples in general but find them hard to read >> at lengths > 5-10 or so. Could we condense this a bit? Ok, Do I need to send V2 ? Or condense the commit log when merged the patch?
>> I'm a bit wary about getting the costs >> right for combine patterns but we can deal with this later. No, you don't need to worry about combining extensions and I don't think we need cost to adjust extensions combining. For vmv.v.x + vadd.vv ==> vadd.vx, we can't claim that vadd.vx is better since it will increase scalar register pressures. So, for such combining, I would like take a another approach to combine this pattern carefully with accurate register pressure calculation. However, for this patch. vext.vf2 + vext.vf2 + vadd ==> vwadd.vv is always better. I don't think it is possible that using vwadd.vv will be worse. Thanks. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-06-02 15:01 To: juzhe.zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw Subject: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations Hi Juzhe, > ... > vsetvli zero,t1,e8,m1,ta,ma > vle8.v v1,0(a4) > vsetvli t3,zero,e16,m2,ta,ma > vsext.vf2 v6,v1 > vsetvli zero,t1,e8,m1,ta,ma > vle8.v v1,0(a5) > vsetvli t3,zero,e16,m2,ta,ma > add t0,a0,t4 > vzext.vf2 v4,v1 > vmul.vv v2,v4,v6 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v2,0(t0) > vle8.v v1,0(a6) > vsetvli t3,zero,e16,m2,ta,ma > add t0,a1,t4 > vzext.vf2 v2,v1 > vmul.vv v4,v2,v4 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v4,0(t0) > vsetvli t3,zero,e16,m2,ta,ma > add t0,a2,t4 > vmul.vv v2,v2,v6 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v2,0(t0) > add t0,a3,t4 > vle8.v v1,0(a7) > vsetvli t3,zero,e16,m2,ta,ma > sub t6,t6,t1 > vsext.vf2 v2,v1 > vmul.vv v2,v2,v6 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v2,0(t0) > ... > > After this patch: > ... > vsetvli zero,t1,e8,mf2,ta,ma > vle8.v v1,0(a4) > vle8.v v3,0(a5) > vsetvli t6,zero,e8,mf2,ta,ma > add t0,a0,t3 > vwmulsu.vv v2,v1,v3 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v2,0(t0) > vle8.v v2,0(a6) > vsetvli t6,zero,e8,mf2,ta,ma > add t0,a1,t3 > vwmulu.vv v4,v3,v2 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v4,0(t0) > vsetvli t6,zero,e8,mf2,ta,ma > add t0,a2,t3 > vwmulsu.vv v3,v1,v2 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v3,0(t0) > add t0,a3,t3 > vle8.v v3,0(a7) > vsetvli t6,zero,e8,mf2,ta,ma > sub t4,t4,t1 > vwmul.vv v2,v1,v3 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v2,0(t0) > ... I like the code examples in general but find them hard to read at lengths > 5-10 or so. Could we condense this a bit? > +(include "autovec-opt.md") ACK for this. We discussed before that not cluttering the regular autovec.md with combine-targeted patterns too much so I'm in favor of the separate file. In total looks good to me. I'm a bit wary about getting the costs right for combine patterns but we can deal with this later. Regards Robin