Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Jiufu Guo via Gcc-patches Tue, 26 May 2020 21:37:24 -0700

Segher Boessenkool <seg...@kernel.crashing.org> writes:

> Hi!
>
> On Tue, May 26, 2020 at 08:58:13AM +0200, Richard Biener wrote:
>> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
>> <seg...@kernel.crashing.org> wrote:
>> > Yes, cunroll does not have its own option, and that is a problem.  But
>> > that is easy to fix!  Either with an option, or just with params (the
>> > option wouldn't do more than set params anyway?)
>> 
>> Well, given coming up with different names for essentially the same
>> transform is going to be challenging how about sth like
>> 
>> -funroll-loops={early,late,static,dynamic}[insert better names here]
>
> User interface is hard :-)  I think luckily we don't need to change
> anything there yet, just have an internal flag?
>
> But complete unrolling is something quite different, so it should have
> its own flag anyway (at least internally).
>
>> note there's also -fpeel-loops which may match the transform
>> done on GIMPLE better?
>
> Peeling and unrolling are the same thing, if doing complete unrolling
> (or complete peeling), followed by DCE in both cases.  Peeling is a
> nicer name here I think, yeah.
>
>> I'm not sure which are the technically
>> correct terms for unrollings that elide the loop (the backedge).
>
> I don't know a better term than "complete", I don't remember ever seeing
> something else either.


How about "Var(flag_cunroll_grow_size) EnabledBy(funroll-loops ||
funroll-all-loops || fpeel-loops)" Or flag_cunroll_allow_grow_size?

And then using this flags as:
  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
                                                   || optimize >= 3, true);

And we do not need to enable this flag at -O2.

Thanks for all your helpful comments again!

Jiufu

>
>> We're doing such kind of unrolling even if we cannot statically
>> decide which of a set of possible exits we take (and internally
>> call that peeling, if we can statically decide we call it complete
>> unrolling).
>
> "Peeling" is placing some copies of the loop before the loop;
> "unrolling" is placing a few copies of the loop inside the loop body.
> Does that match usage here?
>
>> The RTL side OTOH only performs classical unrolling,
>> preserving the backedge with various strategies for the
>> remaining iterations.
>
> And if you do complete unrolling that way, the backedge can be removed,
> since it can be shown never to be taken.
>
>> As said, for the regression on the 10 branch with ppc I'd add
>> [a hidden] flag that controls the RTL unroller, also set by
>> -funroll-loops and triggered by the ppc specific heuristics.
>
> But the problem is in cunroll?  This is so backwards...  Because some
> other transform abuses the unroller flags, adding a second level flag
> with the same meaning :-(  It will work for fixing the regression,
> sure, and it is slightly less code as well.
>
>
> Segher

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Reply via email to