On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> on 2021/7/22 下午8:56, Richard Biener wrote:
> > On Tue, Jul 20, 2021 at 4:37
> > PM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> This v2 has addressed some review comments/suggestions:
> >>
> >>   - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
> >>   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
> >>     to support loop hierarchy tree rather than just a function,
> >>     and adjust to use loops* accordingly.
> >
> > I actually meant struct loop *, not struct loops * ;)  At the point
> > we pondered to make loop invariant motion work on single
> > loop nests we gave up not only but also because it iterates
> > over the loop nest but all the iterators only ever can process
> > all loops, not say, all loops inside a specific 'loop' (and
> > including that 'loop' if LI_INCLUDE_ROOT).  So the
> > CTOR would take the 'root' of the loop tree as argument.
> >
> > I see that doesn't trivially fit how loops_list works, at least
> > not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
> > could be adjusted to do ONLY_INNERMOST as well?
> >
>
>
> Thanks for the clarification!  I just realized that the previous
> version with struct loops* is problematic, all traversal is
> still bounded with outer_loop == NULL.  I think what you expect
> is to respect the given loop_p root boundary.  Since we just
> record the loops' nums, I think we still need the function* fn?

Would it simplify things if we recorded the actual loop *?

There's still the to_visit reserve which needs a bound on
the number of loops for efficiency reasons.

> So I add one optional argument loop_p root and update the
> visiting codes accordingly.  Before this change, the previous
> visiting uses the outer_loop == NULL as the termination condition,
> it perfectly includes the root itself, but with this given root,
> we have to use it as the termination condition to avoid to iterate
> onto its possible existing next.
>
> For LI_ONLY_INNERMOST, I was thinking whether we can use the
> code like:
>
>     struct loops *fn_loops = loops_for_fn (fn)->larray;
>     for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
>         if (aloop != NULL
>             && aloop->inner == NULL
>             && flow_loop_nested_p (tree_root, aloop))
>              this->to_visit.quick_push (aloop->num);
>
> it has the stable bound, but if the given root only has several
> child loops, it can be much worse if there are many loops in fn.
> It seems impossible to predict the given root loop hierarchy size,
> maybe we can still use the original linear searching for the case
> loops_for_fn (fn) == root?  But since this visiting seems not so
> performance critical, I chose to share the code originally used
> for FROM_INNERMOST, hope it can have better readability and
> maintainability.

I was indeed looking for something that has execution/storage
bound on the subtree we're interested in.  If we pull the CTOR
out-of-line we can probably keep the linear search for
LI_ONLY_INNERMOST when looking at the whole loop tree.

It just seemed to me that we can eventually re-use a
single loop tree walker for all orders, just adjusting the
places we push.

>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu, also
> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>
> Does the attached patch meet what you expect?

So yeah, it's probably close to what is sensible.  Not sure
whether optimizing the loops for the !only_push_innermost_p
case is important - if we manage to produce a single
walker with conditionals based on 'flags' then IPA-CP should
produce optimal clones as well I guess.

Richard.

>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         * cfgloop.h (loops_list::loops_list): Add one optional argument root
>         and adjust accordingly.

Reply via email to