Re: [RFC] split pseudos during loop unrolling in RTL unroller

Segher Boessenkool Thu, 23 Apr 2020 06:44:26 -0700

On Thu, Apr 23, 2020 at 03:07:23PM +0200, Richard Biener wrote:
> On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
> <seg...@kernel.crashing.org> wrote:
> >
> > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > > But being stuck with something means no progress...  I know
> > > > > very well it's 100 times harder to get rid of something than to
> > > > > add something new ontop.
> > > >
> > > > Well, what progress do you expect to make?  After expand that is :-)
> > >
> > > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > > no CSE, ...
> >
> > RTL CSE for example is very much required to get any good code.  It
> > needs to CSE stuff that wasn't there before expand.
> 
> Sure, but then we should fix that!


The expand pass, but also many RTL passes, will naturally generate code
that can be CSEd.  You don't want passes to have to do this themselves:
for example, this can be constants used to implement some standard
patterns in target code, etc.

> > LOL.
> >
> > The expand pass doesn't often make good choices, and it *shouldn't*, it
> > should not make many choices at all; it should just generate valid RTL,
> > new pseudos for everything, and let later RTL passes make faster code
> > from that.
> 
> But valid RTL is instructions that are recognized.  Which means
> when the target doesn't support an SImode add we may not create
> one.  That's instruction selection ;)

In that sense, you can call all RTL passes instruction selection?
Usually I understand more like combine, cprop, fwprop, cse, ifcvt,
splitters, peepholes.  That kind of thing :-)

Pretty much all of the RTL passes before RA, and a few after it.

> > > > Most of what is done in RTL is done very well.
> > >
> > > Umm, well...  I beg to differ with regard to DF and passes like
> > > postreload-gcse.
> >
> > What is wrong with DF?
> 
> It's slow and memory hungry?

Very true, of course.  But can this be significantly better?

> > Is there something particular in postreload-gcse that is bad?  To me it
> > always is just one of those passes that doesn't do anything :-)  That
> > can and should be cleaned up, sure :-)
> 
> postreload-gcse is ad-hoc, it uses full blown gcse tools that easily
> blow up (compute_transp) when it doesn't really require it
> (Ive fixed things up a bit in dc91c65378cd0e6c0).  But I wonder why,
> if we want to do PRE of loads, we don't simply schedule another
> gcse pass rather than implementing a new one.  IIRC what the pass
> does could be done with much more local dataflow.  Both
> postreload gcse and cse are major time-hogs on "bad" testcases :/

RTL CSE?  Really?  It just loves to give up early (which is a bad thing
of course, but that makes it take bounded time, and *less* on bad
testcases :-) )

So the "normal" gcse does not have this problem?

> > Oh no, I think we should do more earlier, and GIMPLE is a fine IR for
> > there.  But for low-level, close-to-the-machine stuff, RTL is much
> > better suited.  And we *do* want to optimise at that level as well, and
> > much more than just peepholes.
> 
> Well, everything that requires costing (unrolling, vectorization,
> IV selection to name a few) _is_ close-to-the-machine.  We're
> just saying they are not because GIMPLE is so much easier to
> work with here (not sure why exactly...).

Those transforms aren't close to the machine, not in the same way,
because they are beneficial independent of what exact instruction
sequences are generated.

Both are nasty in that both have cases doing the transform actually
hurts quite a bit; but *not* doing it where it *could* costs a lot as
well.  But other than that "little" issue ;-)


Segher

Re: [RFC] split pseudos during loop unrolling in RTL unroller

Reply via email to