Hi Juzhe, > ... > vsetvli zero,t1,e8,m1,ta,ma > vle8.v v1,0(a4) > vsetvli t3,zero,e16,m2,ta,ma > vsext.vf2 v6,v1 > vsetvli zero,t1,e8,m1,ta,ma > vle8.v v1,0(a5) > vsetvli t3,zero,e16,m2,ta,ma > add t0,a0,t4 > vzext.vf2 v4,v1 > vmul.vv v2,v4,v6 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v2,0(t0) > vle8.v v1,0(a6) > vsetvli t3,zero,e16,m2,ta,ma > add t0,a1,t4 > vzext.vf2 v2,v1 > vmul.vv v4,v2,v4 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v4,0(t0) > vsetvli t3,zero,e16,m2,ta,ma > add t0,a2,t4 > vmul.vv v2,v2,v6 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v2,0(t0) > add t0,a3,t4 > vle8.v v1,0(a7) > vsetvli t3,zero,e16,m2,ta,ma > sub t6,t6,t1 > vsext.vf2 v2,v1 > vmul.vv v2,v2,v6 > vsetvli zero,t1,e16,m2,ta,ma > vse16.v v2,0(t0) > ... > > After this patch: > ... > vsetvli zero,t1,e8,mf2,ta,ma > vle8.v v1,0(a4) > vle8.v v3,0(a5) > vsetvli t6,zero,e8,mf2,ta,ma > add t0,a0,t3 > vwmulsu.vv v2,v1,v3 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v2,0(t0) > vle8.v v2,0(a6) > vsetvli t6,zero,e8,mf2,ta,ma > add t0,a1,t3 > vwmulu.vv v4,v3,v2 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v4,0(t0) > vsetvli t6,zero,e8,mf2,ta,ma > add t0,a2,t3 > vwmulsu.vv v3,v1,v2 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v3,0(t0) > add t0,a3,t3 > vle8.v v3,0(a7) > vsetvli t6,zero,e8,mf2,ta,ma > sub t4,t4,t1 > vwmul.vv v2,v1,v3 > vsetvli zero,t1,e16,m1,ta,ma > vse16.v v2,0(t0) > ...
I like the code examples in general but find them hard to read at lengths > 5-10 or so. Could we condense this a bit? > +(include "autovec-opt.md") ACK for this. We discussed before that not cluttering the regular autovec.md with combine-targeted patterns too much so I'm in favor of the separate file. In total looks good to me. I'm a bit wary about getting the costs right for combine patterns but we can deal with this later. Regards Robin