On Wed, Nov 10, 2021 at 2:29 PM Martin Liška <mli...@suse.cz> wrote:
>
> On 11/10/21 09:59, Richard Biener wrote:
> > On Tue, Nov 9, 2021 at 5:44 PM Martin Liška <mli...@suse.cz> wrote:
> >>
> >> On 11/9/21 14:37, Richard Biener wrote:
> >>> On Mon, Nov 8, 2021 at 8:45 PM Andrew MacLeod <amacl...@redhat.com> wrote:
> >>>>
> >>>> On 11/8/21 10:05 AM, Martin Liška wrote:
> >>>>> On 9/28/21 22:39, Andrew MacLeod wrote:
> >>>>>> In Theory, modifying the IL should be fine, it happens already in
> >>>>>> places, but its not extensively tested under those conditions yet.
> >>>>>
> >>>>> Hello Andrew.
> >>>>>
> >>>>> I've just tried using a global gimple_ranger and it crashes when loop
> >>>>> unswitching duplicates
> >>>>> some BBs.
> >>>>>
> >>>>> Please try the attached patch for:
> >>>>
> >>>> hey Martin,
> >>>>
> >>>> try using this in your tree.  Since nothing else is using a growing BB
> >>>> right now, I'll let you work with it and see if everything works as
> >>>> expected before checking it in, just in case we need more tweaking.
> >>>> With this,
> >>>>
> >>>> make RUNTESTFLAGS=dg.exp=loop-unswitch*.c check-gcc
> >>>>
> >>>> runs clean.
> >>>>
> >>>>
> >>>> basically, I tried to grow it by either a factor of 10% for the current
> >>>> BB size when the grow is requested, or some double the needed extra
> >>>> size, or 128... whichever value is "maximum"    That means it shoudnt be
> >>>> asking for tooo much each time, but also not a minimum amount.
> >>>>
> >>>> Im certainly open to suggestion on how much to grow it each time.
> >>>> Note the vector being grown is ONLY fo the SSA_NAme being asked for.. so
> >>>> it really an on-demand thing just for specific names, in your case,
> >>>> mostly just the switch index.
> >>>>
> >>>> Let me know how this works for you, and if you have any other issues.
> >>>
> >>> So I think in the end we shouldn't need the growing.  Ideally we'd do all
> >>> the analysis before the first transform, but for that we'd need ranger to
> >>> be able to "simplify" conditions based on a known true/false predicate
> >>> that's not yet in the IL.  Consider
> >>>
> >>>    for (;;)
> >>>      {
> >>>           if (invariant < 3) // A
> >>>             {
> >>> ...
> >>>             }
> >>>           if (invariant < 5) // B
> >>>             {
> >>> ...
> >>>             }
> >>>      }
> >>>
> >>> unswitch analysis will run into the condition 'A' and determine the loop
> >>> can be unswitched with the condition 'invariant < 3'.  To be able to
> >>> perform cost assessment and to avoid redundant unswitching we
> >>> want to determine that if we unswitch with 'invariant < 3' being
> >>> true then the condition at 'B' is true as well before actually inserting
> >>> the if (invariant < 3) outside of the loop.
> >>>
> >>> So I'm thinking of assigning a gimple_uid to each condition we want to
> >>> unswitch on and have an array indexed by the uid with meta-data on
> >>> the unswitch opportunity, the "related" conditions could be marked with
> >>> the same uid (or some other), and the folding result recorded so that
> >>> at transform time we can just do the appropriate replacement without
> >>> invoking ranger again.
> >>
> >> Calculating all this before transformation is quite ambitious based on the 
> >> code
> >> we have now.
> >>
> >> Note one can have in a loop:
> >>
> >> if (a > 100)
> >>      ...
> >>
> >> switch (a)
> >>      case 1000:
> >>        ...
> >>      case 20:
> >>        ...
> >>      case 200:
> >>        ...
> >>
> >> which means the first predicate effectively makes some cases unreachable. 
> >> Moreover
> >> one can have
> >>
> >> if (a > 100 && b < 300)
> >>      ...
> >>
> >> and more complex conditions.
> >
> > True - I guess we should do two things.
>
> All right.
>
> >
> >   1) keep simplify_using_entry_checks like code for symbolic conditions
> >   2) add integer ranges for unswitch conditions producing them, that
> >       includes all unswitching of switch stmts - we might be able to use
> >       the ranger queries (with global ranges) to simplify stmts with the
> >       known ranges as noted by Andrew
> >
> > I do think that pre-computing the simplifications is what we should do
> > to be able to make the cost modeling sane.  What we can avoid
> > trying is evaluating multiple unswitch possibilities to pick the "best".
>
> So the first step would be taking all unswitching candidates (gconds 
> basically)
> and grouping them (all items in a group would fold to true edge in the 
> unswitched loop).
> Is it something we want to do combining simplify_using_entry_checks and 
> fold_range ranger
> capability?

The first step is to, after finding the first condition to unswitch on, continue
scanning the loop body for related conditions that will simplify to be able to
assess the true cost of the unswitching more readily and to prepare the
simplification step after the loop copying.

That is, the first step is to "fix" the cost modeling.

I wouldn't go as far as trying to discover all unswitching opportunities at once
yet - even though the far goal would be to maybe do that to discover the most
profitable / cheapest.

If you look at simplify_using_entry_checks then this is really really simple,
so I'd try to abstract this, recording sth like a unswitch_predicate where
we store the condition we unswitch on plus maybe cache the constant
range of a VAR cmp CST variable condition on the true/false edge.  We
can then try to simplify each gcond/gswitch based on such an unswitch_predicate
(when we ever scan the loop once to discover all opportunities we'd have a
set of unswitch_predicates to try simplifying against).  As said the integer
range thing would be an improvement over the current state so even that
can be done as followup but I guess for gswitch support that's going to be
the thing to use.

So I'd try to do no functional change first, improving the costing and
setting up the transform to simply pick up the stmts to "fold" as discovered
during analysis (as I hinted you possibly can use gimple_uid to mark
the stmts that simplify, IIRC gimple_uid is preserved during copying.
gimple_uid would also scale better than gimple_plf in case we do
the analysis for all candidates at once).

As for costing in the end we should probably assign a budget based on
the original loop and eat from that for all unswitchings done on it.

> >
> > I think changing the code do to the analysis first should be done
> > before wiring in gcond support, even adding the additional 'range'
>
> s/gcond/switch, right?

Eh, yes - obviously ;)

>
> > capability will be useful without that since the current code
> > wont figure out a > 5 is true when we unswitch on a > 3.
>
> Agree that gswitch support should be added later.
>
> Martin
>
> >
> >>>
> >>> Now, but how do we arrange for the ranger analysis here?
> >>
> >> That's likely something we need support from ranger, yes.
> >>
> >>>
> >>> We might also somehow want to remember that on the
> >>> 'invariant < 3' == false copy of the loop there's still the
> >>> unswitching opportunity on 'invariant < 5', but not on the
> >>> 'invariant < 5' == true copy.
> >>>
> >>> Currently unswitching uses a custom simplify_using_entry_checks
> >>> which tries to do simplification only after the fact (and so costing
> >>> also is far from costing the true cost and ordering of the opportunities
> >>> to do the best first is not implemented either).
> >>
> >> I'm sending updated version of the patch where I changed:
> >> - simplify_using_entry_checks is put back for the floating point 
> >> expressions
> >> - all scans utilize scan-tree-dump-times
> >> - some new tests were added
> >> - global ranger is used (I rely on the growing patch provided by Andrew)
> >> - better ranger API is used for gcond expression: ranger.range_of_stmt (r, 
> >> stmt) && r.singleton_p (&result))
> >> - auto_edge_flag is used now
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Thoughts?
> >> Martin
> >>
> >>>
> >>> Richard.
> >>>
> >>>> Andrew
> >>>>
>

Reply via email to