On Fri, 14 May 2021, Xionghu Luo wrote:

> Hi Richi,
> 
> On 2021/4/21 19:54, Richard Biener wrote:
> > On Tue, 20 Apr 2021, Xionghu Luo wrote:
> > 
> >>
> >>
> >> On 2021/4/15 19:34, Richard Biener wrote:
> >>> On Thu, 15 Apr 2021, Xionghu Luo wrote:
> >>>
> >>>> Thanks,
> >>>>
> >>>> On 2021/4/14 14:41, Richard Biener wrote:
> >>>>>> "#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink,
> >>>>>> but it moves #538 first, then #235, there is strong dependency here. It
> >>>>>> seemsdoesn't like the LCM framework that could solve all and do the
> >>>>>> delete-insert in one iteration.
> >>>>> So my question was whether we want to do both within the LCM store
> >>>>> sinking framework.  The LCM dataflow is also used by RTL PRE which
> >>>>> handles both loads and non-loads so in principle it should be able
> >>>>> to handle stores and non-stores for the sinking case (PRE on the
> >>>>> reverse CFG).
> >>>>>
> >>>>> A global dataflow is more powerful than any local ad-hoc method.
> >>>>
> >>>> My biggest concern is whether the LCM DF framework could support sinking
> >>>> *multiple* reverse-dependent non-store instructions together by *one*
> >>>> calling of LCM DF.   If this is not supported, we need run multiple LCM
> >>>> until no new changes, it would be time consuming obviously (unless
> >>>> compiling time is not important here).
> >>>
> >>> As said it is used for PRE and there it most definitely can do that.
> >>
> >> I did some investigation about PRE and attached a case to show how it
> >> works, it is quite like store-motion, and actually there is a rtl-hoist
> >> pass in gcse.c which only works for code size.  All of them are
> >> leveraging the LCM framework to move instructions upward or downward.
> >>
> >> PRE and rtl-hoist move instructions upward, they analyze/hash the SOURCE
> >> exprs and call pre_edge_lcm, store-motion and rtl-sink move instructions
> >> downward, so they analyze/hash the DEST exprs and call pre_edge_rev_lcm.
> >> The four problems are all converted to the LCM DF problem with
> >> n_basic_blocks * m_exprs of 4 matrix (antic, transp, avail, kill) as input
> >> and two outputs of where to insert/delete.
> >>
> >> PRE scan each instruction and hash the SRC to table without *checking the
> >> relationship between instructions*, for the case attached, BB 37, BB 38
> >> and BB 41 both contains SOURCE expr "r262:DI+r139:DI", but BB 37 and BB 41
> >> save it to index 106, BB 38 save it to index 110. After finishing this 
> >> pass,
> >> "r262:DI+r139:DI" BB41 is replaced with "r194:DI=r452:DI", then insert
> >> expr to BB 75~BB 80 to create full redundancies from partial redundancies,
> >> finally update instruction in BB 37.
> > 
> > I'm not familiar with the actual PRE code but reading the toplevel comment
> > it seems that indeed it can only handle expressions contained in a single
> > insn unless a REG_EQUAL note provides a short-hand for the larger one.
> > 
> > That of course means it would need to mark things as not transparent
> > for correctness where they'd be if moved together.  Now, nothing
> > prevents you changing the granularity of what you feed LCM.
> > 
> > So originally we arrived at looking into LCM because there's already
> > a (store) sinking pass on RTL (using LCM) so adding another (loop-special)
> > one didn't look like the best obvious solution.
> > 
> > That said, LCM would work for single-instruction expressions.
> > Alternatively a greedy algorithm like you prototyped could be used.
> > Another pass to look at would be RTL invariant motion which seems to
> > compute some kind of dependency graph - not sure if that would be
> > adaptable for the reverse CFG problem.
> > 
> 
> Actually my RTL sinking pass patch is borrowed from RTL loop invariant
> motion, it is  quite limited since only moves instructions from loop header
> to loop exits, though it could be refined with various of algorithms.
> Compared to the initial method of running gimple sink pass once more, 
> it seems much more complicated and limited without gaining obvious performance
> benefit, shall we turn back to consider gimple sink2 pass from original since
> we are in stage1 now?

OK, so while there might be new sinking opportunities exposed during
RTL expansion and early RTL opts we can consider adding another sink pass
on GIMPLE.  Since it's basically a scheduling optimization placement
shouldn't matter much but I suppose we should run it before store
merging, so anywhere between cd_dce and that.

Richard.

Reply via email to