Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

Jeff Law via Gcc-patches Thu, 29 Jun 2023 16:39:37 -0700



On 6/28/23 16:00, 钟居哲 wrote:

You can see here:

https://godbolt.org/z/d78646hWb <https://godbolt.org/z/d78646hWb>

So just to be explicit, I see no difference with that test before/afteryour proposed change. Nor would I expect one based on my understandingof the patch.

The explicit conversions I see are because we need the output of theconversion in multiple vfmul instructions. That won't be helped by thepatch you've proposed.


To be more concrete:

       vsetvli t1,t5,e32,mf2,ta,ma     # 99    [c=0 l=4]  vsetvldi
        vle32.v v2,0(a4)        # 23    [c=4 l=4]  pred_movvnx2sf/1
        vle32.v v1,0(a5)        # 25    [c=4 l=4]  pred_movvnx2sf/1
        vsetvli t0,zero,e32,mf2,ta,ma   # 101   [c=0 l=4]  vsetvldi
        vfwcvt.f.f.v    v3,v2   # 77    [c=4 l=4]  pred_extendvnx2df/0
        vfwcvt.f.f.v    v2,v1   # 79    [c=4 l=4]  pred_extendvnx2df/0
        vsetvli zero,t1,e32,mf2,ta,ma   # 102   [c=0 l=4]  
vsetvl_discard_resultdi
        vle32.v v5,0(a6)        # 31    [c=4 l=4]  pred_movvnx2sf/1
        vle32.v v4,0(a7)        # 39    [c=4 l=4]  pred_movvnx2sf/1
        vsetvli t0,zero,e32,mf2,ta,ma   # 103   [c=0 l=4]  vsetvldi
        vfwcvt.f.f.v    v1,v5   # 81    [c=4 l=4]  pred_extendvnx2df/0
        vsetvli zero,zero,e64,m1,ta,ma  # 104   [c=16 l=4]  
vsetvl_vtype_change_only
        vfmul.vv        v5,v2,v3        # 29    [c=4 l=4]  pred_mulvnx2df/2
        vfmul.vv        v2,v1,v2        # 34    [c=4 l=4]  pred_mulvnx2df/2
        vsetvli zero,t1,e64,m1,ta,ma    # 105   [c=0 l=4]  
vsetvl_discard_resultdi
        vse64.v v2,0(a1)        # 35    [c=4 l=4]  pred_storevnx2df
        vse64.v v5,0(a0)        # 30    [c=4 l=4]  pred_storevnx2df
        vsetvli t6,zero,e64,m1,ta,ma    # 106   [c=0 l=4]  vsetvldi
        vfmul.vv        v1,v1,v3        # 37    [c=4 l=4]  pred_mulvnx2df/2
        vsetvli zero,zero,e32,mf2,ta,ma # 107   [c=20 l=4]  
vsetvl_vtype_change_only
        vfwcvt.f.f.v    v2,v4   # 83    [c=4 l=4]  pred_extendvnx2df/0
        vsetvli zero,t1,e64,m1,ta,ma    # 108   [c=0 l=4]  
vsetvl_discard_resultdi
        vse64.v v1,0(a2)        # 38    [c=4 l=4]  pred_storevnx2df
        vsetvli t6,zero,e64,m1,ta,ma    # 109   [c=0 l=4]  vsetvldi
        slli    t4,t1,2 # 22    [c=4 l=4]  ashldi3
        slli    t3,t1,3 # 27    [c=4 l=4]  ashldi3
        vfmul.vv        v1,v2,v3        # 42    [c=4 l=4]  pred_mulvnx2df/2

Note how the output of the explicit conversion done in insn 77 is usedby the vfmul in insns 29, 37 and 42. Similarly for the other explcitconversions.


Your pattern isn't going to help that problem.

You could model this as a dependency height reduction. I think thatwill get you were you want to go.


You'll need a pattern that matches this:

(parallel [(set (reg:VNx2DF 160 [ vect__11.15 ])

            (if_then_else:VNx2DF (unspec:VNx2BI [
                        (const_vector:VNx2BI repeat [
                                (const_int 1 [0x1])

])(reg:DI 169)

                        (const_int 2 [0x2]) repeated x2

(const_int 1 [0x1])(const_int 7 [0x7])

                        (reg:SI 66 vl)
                        (reg:SI 67 vtype)
                        (reg:SI 69 frm)
                    ] UNSPEC_VPREDICATE)
                (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 144 [ vect__7.13 
]))
                    (float_extend:VNx2DF (reg:VNx2SF 146 [ vect__4.9 ])))
                (unspec:VNx2DF [
                        (reg:SI 0 zero)
                    ] UNSPEC_VUNDEF)))
        (set (reg:VNx2DF 143 [ vect__8.14 ])
            (float_extend:VNx2DF (reg:VNx2SF 144 [ vect__7.13 ])))
        (set (reg:VNx2DF 145 [ vect__5.10 ])
            (float_extend:VNx2DF (reg:VNx2SF 146 [ vect__4.9 ])))
    ])

It'll need to be a define_insn_and_split as its a 3->3 splitter. Thesplit will emit the two extensions and the widening multiply as 3distinct insns.

This has two positive effects. First the widening multiply is no longerdata dependent on the float_extend and so it can issue when ever r144and r146 are ready rather than when r143 and r145 are ready.

The second effect is I think this pattern will end up matching all themultiplies in this sample code. As a result all the float_extend insnsyou generated when splitting become dead and should be removed by DCE.



Jeff

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

Reply via email to