[PATCH v2] Make loops_list support an optional loop_p root

2021-07-29 Thread Kewen.Lin via Gcc-patches
on 2021/7/29 下午4:01, Richard Biener wrote:
> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin  wrote:
>>
>> on 2021/7/22 下午8:56, Richard Biener wrote:
>>> On Tue, Jul 20, 2021 at 4:37
>>> PM Kewen.Lin  wrote:

 Hi,

 This v2 has addressed some review comments/suggestions:

   - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
 to support loop hierarchy tree rather than just a function,
 and adjust to use loops* accordingly.
>>>
>>> I actually meant struct loop *, not struct loops * ;)  At the point
>>> we pondered to make loop invariant motion work on single
>>> loop nests we gave up not only but also because it iterates
>>> over the loop nest but all the iterators only ever can process
>>> all loops, not say, all loops inside a specific 'loop' (and
>>> including that 'loop' if LI_INCLUDE_ROOT).  So the
>>> CTOR would take the 'root' of the loop tree as argument.
>>>
>>> I see that doesn't trivially fit how loops_list works, at least
>>> not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
>>> could be adjusted to do ONLY_INNERMOST as well?
>>>
>>
>>
>> Thanks for the clarification!  I just realized that the previous
>> version with struct loops* is problematic, all traversal is
>> still bounded with outer_loop == NULL.  I think what you expect
>> is to respect the given loop_p root boundary.  Since we just
>> record the loops' nums, I think we still need the function* fn?
> 
> Would it simplify things if we recorded the actual loop *?
> 

I'm afraid it's unsafe to record the loop*.  I had the same
question why the loop iterator uses index rather than loop* when
I read this at the first time.  I guess the design of processing
loops allows its user to update or even delete the folllowing
loops to be visited.  For example, when the user does some tricks
on one loop, then it duplicates the loop and its children to
somewhere and then removes the loop and its children, when
iterating onto its children later, the "index" way will check its
validity by get_loop at that point, but the "loop *" way will
have some recorded pointers to become dangling, can't do the
validity check on itself, seems to need a side linear search to
ensure the validity.

> There's still the to_visit reserve which needs a bound on
> the number of loops for efficiency reasons.
> 

Yes, I still keep the fn in the updated version.

>> So I add one optional argument loop_p root and update the
>> visiting codes accordingly.  Before this change, the previous
>> visiting uses the outer_loop == NULL as the termination condition,
>> it perfectly includes the root itself, but with this given root,
>> we have to use it as the termination condition to avoid to iterate
>> onto its possible existing next.
>>
>> For LI_ONLY_INNERMOST, I was thinking whether we can use the
>> code like:
>>
>> struct loops *fn_loops = loops_for_fn (fn)->larray;
>> for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
>> if (aloop != NULL
>> && aloop->inner == NULL
>> && flow_loop_nested_p (tree_root, aloop))
>>  this->to_visit.quick_push (aloop->num);
>>
>> it has the stable bound, but if the given root only has several
>> child loops, it can be much worse if there are many loops in fn.
>> It seems impossible to predict the given root loop hierarchy size,
>> maybe we can still use the original linear searching for the case
>> loops_for_fn (fn) == root?  But since this visiting seems not so
>> performance critical, I chose to share the code originally used
>> for FROM_INNERMOST, hope it can have better readability and
>> maintainability.
> 
> I was indeed looking for something that has execution/storage
> bound on the subtree we're interested in.  If we pull the CTOR
> out-of-line we can probably keep the linear search for
> LI_ONLY_INNERMOST when looking at the whole loop tree.
> 

OK, I've moved the suggested single loop tree walker out-of-line
to cfgloop.c, and brought the linear search back for
LI_ONLY_INNERMOST when looking at the whole loop tree.

> It just seemed to me that we can eventually re-use a
> single loop tree walker for all orders, just adjusting the
> places we push.
> 

Wow, good point!  Indeed, I have further unified all orders
handlings into a single function walk_loop_tree.

>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, also
>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>>
>> Does the attached patch meet what you expect?
> 
> So yeah, it's probably close to what is sensible.  Not sure
> whether optimizing the loops for the !only_push_innermost_p
> case is important - if we manage to produce a single
> walker with conditionals based on 'flags' then IPA-CP should
> produce optimal clones as well I guess.
> 

Thanks for the comments, the updated v2 is attached.
Comparing with v1, it does:

  - Unify one single

Re: [PATCH v2] Make loops_list support an optional loop_p root

2021-08-03 Thread Richard Biener via Gcc-patches
On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin  wrote:
>
> on 2021/7/29 下午4:01, Richard Biener wrote:
> > On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin  wrote:
> >>
> >> on 2021/7/22 下午8:56, Richard Biener wrote:
> >>> On Tue, Jul 20, 2021 at 4:37
> >>> PM Kewen.Lin  wrote:
> 
>  Hi,
> 
>  This v2 has addressed some review comments/suggestions:
> 
>    - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
>    - Add new CTOR loops_list (struct loops *loops, unsigned flags)
>  to support loop hierarchy tree rather than just a function,
>  and adjust to use loops* accordingly.
> >>>
> >>> I actually meant struct loop *, not struct loops * ;)  At the point
> >>> we pondered to make loop invariant motion work on single
> >>> loop nests we gave up not only but also because it iterates
> >>> over the loop nest but all the iterators only ever can process
> >>> all loops, not say, all loops inside a specific 'loop' (and
> >>> including that 'loop' if LI_INCLUDE_ROOT).  So the
> >>> CTOR would take the 'root' of the loop tree as argument.
> >>>
> >>> I see that doesn't trivially fit how loops_list works, at least
> >>> not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
> >>> could be adjusted to do ONLY_INNERMOST as well?
> >>>
> >>
> >>
> >> Thanks for the clarification!  I just realized that the previous
> >> version with struct loops* is problematic, all traversal is
> >> still bounded with outer_loop == NULL.  I think what you expect
> >> is to respect the given loop_p root boundary.  Since we just
> >> record the loops' nums, I think we still need the function* fn?
> >
> > Would it simplify things if we recorded the actual loop *?
> >
>
> I'm afraid it's unsafe to record the loop*.  I had the same
> question why the loop iterator uses index rather than loop* when
> I read this at the first time.  I guess the design of processing
> loops allows its user to update or even delete the folllowing
> loops to be visited.  For example, when the user does some tricks
> on one loop, then it duplicates the loop and its children to
> somewhere and then removes the loop and its children, when
> iterating onto its children later, the "index" way will check its
> validity by get_loop at that point, but the "loop *" way will
> have some recorded pointers to become dangling, can't do the
> validity check on itself, seems to need a side linear search to
> ensure the validity.
>
> > There's still the to_visit reserve which needs a bound on
> > the number of loops for efficiency reasons.
> >
>
> Yes, I still keep the fn in the updated version.
>
> >> So I add one optional argument loop_p root and update the
> >> visiting codes accordingly.  Before this change, the previous
> >> visiting uses the outer_loop == NULL as the termination condition,
> >> it perfectly includes the root itself, but with this given root,
> >> we have to use it as the termination condition to avoid to iterate
> >> onto its possible existing next.
> >>
> >> For LI_ONLY_INNERMOST, I was thinking whether we can use the
> >> code like:
> >>
> >> struct loops *fn_loops = loops_for_fn (fn)->larray;
> >> for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
> >> if (aloop != NULL
> >> && aloop->inner == NULL
> >> && flow_loop_nested_p (tree_root, aloop))
> >>  this->to_visit.quick_push (aloop->num);
> >>
> >> it has the stable bound, but if the given root only has several
> >> child loops, it can be much worse if there are many loops in fn.
> >> It seems impossible to predict the given root loop hierarchy size,
> >> maybe we can still use the original linear searching for the case
> >> loops_for_fn (fn) == root?  But since this visiting seems not so
> >> performance critical, I chose to share the code originally used
> >> for FROM_INNERMOST, hope it can have better readability and
> >> maintainability.
> >
> > I was indeed looking for something that has execution/storage
> > bound on the subtree we're interested in.  If we pull the CTOR
> > out-of-line we can probably keep the linear search for
> > LI_ONLY_INNERMOST when looking at the whole loop tree.
> >
>
> OK, I've moved the suggested single loop tree walker out-of-line
> to cfgloop.c, and brought the linear search back for
> LI_ONLY_INNERMOST when looking at the whole loop tree.
>
> > It just seemed to me that we can eventually re-use a
> > single loop tree walker for all orders, just adjusting the
> > places we push.
> >
>
> Wow, good point!  Indeed, I have further unified all orders
> handlings into a single function walk_loop_tree.
>
> >>
> >> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> >> x86_64-redhat-linux and aarch64-linux-gnu, also
> >> bootstrapped on ppc64le P9 with bootstrap-O3 config.
> >>
> >> Does the attached patch meet what you expect?
> >
> > So yeah, it's probably close to what is sensible.  Not sure
> > whether optimizing the loops for the !only_push_innermost_p
> > cas