Re: [PATCH 1/4] unroll: Add middle-end unroll factor estimation

Richard Sandiford via Gcc-patches Mon, 25 Jan 2021 09:59:32 -0800

Richard Biener <rguent...@suse.de> writes:
> On Fri, 22 Jan 2021, Segher Boessenkool wrote:
>
>> On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote:
>> > On Thu, 21 Jan 2021, Segher Boessenkool wrote:
>> > > What is holding up this patch still?  Ke Wen has pinged it every month
>> > > since May, and there has still not been a review.
>> 
>> Richard Sandiford wrote:
>> > FAOD (since I'm on cc:), I don't feel qualified to review this.
>> > Tree-level loop stuff isn't really my area.
>> 
>> And Richard Biener wrote:
>> > I don't like it, it feels wrong but I don't have a good suggestion
>> > that had positive feedback.  Since a reviewer / approver is indirectly
>> > responsible for at least the design I do not want to ack this patch.
>> > Bin made forward progress on the other parts of the series but clearly
>> > there's somebody missing with the appropriate privileges who feels
>> > positive about the patch and its general direction.
>> > 
>> > Sorry to be of no help here.
>> 
>> How unfortunate :-(
>> 
>> So, first off, this will then have to work for next stage 1 to make any
>> progress.  Rats.
>> 
>> But what could have been done differently that would have helped?  Of
>> course Ke Wen could have written a better patch (aka one that is more
>> acceptable); either of you could have made your current replies earlier,
>> so that it is clear help needs to be sought elsewhere; and I could have
>> pushed people earlier, too.  No one really did anything wrong, I'm not
>> seeking who to blame, I'm just trying to find out how to prevent
>> deadlocks like this in the future (where one party waits for replies
>> that will never come).
>> 
>> Is it just that we have a big gaping hole in reviewers with experience
>> in such loop optimisations?
>
> May be.  But what I think is the biggest problem is that we do not
> have a good way to achieve what the patch tries (if you review the
> communications you'll see many ideas tossed around) first and foremost
> because IV selection is happening early on GIMPLE and unrolling
> happens late on RTL.  Both need a quite accurate estimate of costs
> but unrolling has an ever harder time than IV selection where we've
> got along with throwing dummy RTL at costing functions.
>
> IMHO the patch is the wrong "start" to try fixing the issue and my
> fear is that wiring this kind of "features" into the current
> (fundamentally broken) state will make it much harder to rework
> that state without introducing regressions on said features (I'm
> there with trying to turn the vectorizer upside down - for three
> years now, struggling to not regress any of the "features" we've
> accumulated for various targets where most of them feel a
> "bolted-on" rather than well-designed ;/).


Thinking of any features in particular here?

Most of the ones I can think of seem to be doing things in the way
that the current infrastructure expects.  But of course, the current
infrastructure isn't perfect, so the end result isn't either.

Still, I agree with the above apart from maybe that last bit. ;-)

> I think IV selection and unrolling (and scheduling FWIW) need to move
> closer together.  I do not have a good idea how that can work out
> though but I very much believe that this "most wanted" GIMPLE unroller
> will not be a good way of progressing here.

What do you feel about unrolling in the vectoriser (by doubling the VF, etc.)
in cases where something about the target indicates that that would be
useful?  I think that's a good place to do it (for the cases that it
handles) because it's hard to unroll later and then interleave.

> Maybe taking the bullet and moving IV selection back to RTL is the
> answer.

I think that would be a bad move.  The trend recently seems to have been
to lower stuff to individual machine operations earlier in the rtl pass
pipeline (often immediately during expand) rather than split them later.
The reasoning behind that is that (1) gimple has already heavily optimised
the unlowered form and (2) lowering earlier gives the more powerful rtl
optimisers a chance to do something with the individual machine operations.
It's going to be hard for an RTL ivopts pass to piece everything back
together.

> For a "short term" solution I still think that trying to perform
> unrolling and IV selection (for the D-form case you're targeting)
> at the same time is a better design, even if it means complicating
> the IV selection pass (and yeah, it'll still be at GIMPLE and w/o
> any good idea about scheduling).  There are currently 20+ GIMPLE
> optimization passes and 10+ RTL optimization passes between
> IV selection and unrolling, the idea that you can have transform
> decision and transform apply this far apart looks scary.

FWIW, another option might be to go back to something like:

  https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532676.html

I agree that it was worth putting that series on hold and trying a more
target-independent approach, but I think in the end it didn't work out,
for the reasons Richard says.  At least the target-specific pass would
be making a strict improvement to the IL that it sees, rather than
having to predict what future passes might do or might want.

Thanks,
Richard

Re: [PATCH 1/4] unroll: Add middle-end unroll factor estimation

Reply via email to