on 2021/8/4 下午8:04, Richard Biener wrote: > On Wed, Aug 4, 2021 at 12:47 PM Kewen.Lin <li...@linux.ibm.com> wrote: >> >> on 2021/8/4 下午6:01, Richard Biener wrote: >>> On Wed, Aug 4, 2021 at 4:36 AM Kewen.Lin <li...@linux.ibm.com> wrote: >>>> >>>> on 2021/8/3 下午8:08, Richard Biener wrote: >>>>> On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin <li...@linux.ibm.com> wrote: >>>>>> >>>>>> on 2021/7/29 下午4:01, Richard Biener wrote: >>>>>>> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin <li...@linux.ibm.com> wrote: >>>>>>>> >>>>>>>> on 2021/7/22 下午8:56, Richard Biener wrote: >>>>>>>>> On Tue, Jul 20, 2021 at 4:37 >>>>>>>>> PM Kewen.Lin <li...@linux.ibm.com> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> This v2 has addressed some review comments/suggestions: >>>>>>>>>> >>>>>>>>>> - Use "!=" instead of "<" in function operator!= (const Iter &rhs) >>>>>>>>>> - Add new CTOR loops_list (struct loops *loops, unsigned flags) >>>>>>>>>> to support loop hierarchy tree rather than just a function, >>>>>>>>>> and adjust to use loops* accordingly. >>>>>>>>> >>>>>>>>> I actually meant struct loop *, not struct loops * ;) At the point >>>>>>>>> we pondered to make loop invariant motion work on single >>>>>>>>> loop nests we gave up not only but also because it iterates >>>>>>>>> over the loop nest but all the iterators only ever can process >>>>>>>>> all loops, not say, all loops inside a specific 'loop' (and >>>>>>>>> including that 'loop' if LI_INCLUDE_ROOT). So the >>>>>>>>> CTOR would take the 'root' of the loop tree as argument. >>>>>>>>> >>>>>>>>> I see that doesn't trivially fit how loops_list works, at least >>>>>>>>> not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST >>>>>>>>> could be adjusted to do ONLY_INNERMOST as well? >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks for the clarification! I just realized that the previous >>>>>>>> version with struct loops* is problematic, all traversal is >>>>>>>> still bounded with outer_loop == NULL. I think what you expect >>>>>>>> is to respect the given loop_p root boundary. Since we just >>>>>>>> record the loops' nums, I think we still need the function* fn? >>>>>>> >>>>>>> Would it simplify things if we recorded the actual loop *? >>>>>>> >>>>>> >>>>>> I'm afraid it's unsafe to record the loop*. I had the same >>>>>> question why the loop iterator uses index rather than loop* when >>>>>> I read this at the first time. I guess the design of processing >>>>>> loops allows its user to update or even delete the folllowing >>>>>> loops to be visited. For example, when the user does some tricks >>>>>> on one loop, then it duplicates the loop and its children to >>>>>> somewhere and then removes the loop and its children, when >>>>>> iterating onto its children later, the "index" way will check its >>>>>> validity by get_loop at that point, but the "loop *" way will >>>>>> have some recorded pointers to become dangling, can't do the >>>>>> validity check on itself, seems to need a side linear search to >>>>>> ensure the validity. >>>>>> >>>>>>> There's still the to_visit reserve which needs a bound on >>>>>>> the number of loops for efficiency reasons. >>>>>>> >>>>>> >>>>>> Yes, I still keep the fn in the updated version. >>>>>> >>>>>>>> So I add one optional argument loop_p root and update the >>>>>>>> visiting codes accordingly. Before this change, the previous >>>>>>>> visiting uses the outer_loop == NULL as the termination condition, >>>>>>>> it perfectly includes the root itself, but with this given root, >>>>>>>> we have to use it as the termination condition to avoid to iterate >>>>>>>> onto its possible existing next. >>>>>>>> >>>>>>>> For LI_ONLY_INNERMOST, I was thinking whether we can use the >>>>>>>> code like: >>>>>>>> >>>>>>>> struct loops *fn_loops = loops_for_fn (fn)->larray; >>>>>>>> for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++) >>>>>>>> if (aloop != NULL >>>>>>>> && aloop->inner == NULL >>>>>>>> && flow_loop_nested_p (tree_root, aloop)) >>>>>>>> this->to_visit.quick_push (aloop->num); >>>>>>>> >>>>>>>> it has the stable bound, but if the given root only has several >>>>>>>> child loops, it can be much worse if there are many loops in fn. >>>>>>>> It seems impossible to predict the given root loop hierarchy size, >>>>>>>> maybe we can still use the original linear searching for the case >>>>>>>> loops_for_fn (fn) == root? But since this visiting seems not so >>>>>>>> performance critical, I chose to share the code originally used >>>>>>>> for FROM_INNERMOST, hope it can have better readability and >>>>>>>> maintainability. >>>>>>> >>>>>>> I was indeed looking for something that has execution/storage >>>>>>> bound on the subtree we're interested in. If we pull the CTOR >>>>>>> out-of-line we can probably keep the linear search for >>>>>>> LI_ONLY_INNERMOST when looking at the whole loop tree. >>>>>>> >>>>>> >>>>>> OK, I've moved the suggested single loop tree walker out-of-line >>>>>> to cfgloop.c, and brought the linear search back for >>>>>> LI_ONLY_INNERMOST when looking at the whole loop tree. >>>>>> >>>>>>> It just seemed to me that we can eventually re-use a >>>>>>> single loop tree walker for all orders, just adjusting the >>>>>>> places we push. >>>>>>> >>>>>> >>>>>> Wow, good point! Indeed, I have further unified all orders >>>>>> handlings into a single function walk_loop_tree. >>>>>> >>>>>>>> >>>>>>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9, >>>>>>>> x86_64-redhat-linux and aarch64-linux-gnu, also >>>>>>>> bootstrapped on ppc64le P9 with bootstrap-O3 config. >>>>>>>> >>>>>>>> Does the attached patch meet what you expect? >>>>>>> >>>>>>> So yeah, it's probably close to what is sensible. Not sure >>>>>>> whether optimizing the loops for the !only_push_innermost_p >>>>>>> case is important - if we manage to produce a single >>>>>>> walker with conditionals based on 'flags' then IPA-CP should >>>>>>> produce optimal clones as well I guess. >>>>>>> >>>>>> >>>>>> Thanks for the comments, the updated v2 is attached. >>>>>> Comparing with v1, it does: >>>>>> >>>>>> - Unify one single loop tree walker for all orders. >>>>>> - Move walk_loop_tree out-of-line to cfgloop.c. >>>>>> - Keep the linear search for LI_ONLY_INNERMOST with >>>>>> tree_root of fn loops. >>>>>> - Use class loop * instead of loop_p. >>>>>> >>>>>> Bootstrapped & regtested on powerpc64le-linux-gnu Power9 >>>>>> (with/without the hunk for LI_ONLY_INNERMOST linear search, >>>>>> it can have the coverage to exercise LI_ONLY_INNERMOST >>>>>> in walk_loop_tree when "without"). >>>>>> >>>>>> Is it ok for trunk? >>>>> >>>>> Looks good to me. I think that the 'mn' was an optimization >>>>> for the linear walk and it's cheaper to pointer test against >>>>> the actual 'root' loop (no need to dereference). Thus >>>>> >>>>> + if (flags & LI_ONLY_INNERMOST && tree_root == loops->tree_root) >>>>> { >>>>> - for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, >>>>> &aloop); i++) >>>>> + class loop *aloop; >>>>> + unsigned int i; >>>>> + for (i = 0; vec_safe_iterate (loops->larray, i, &aloop); i++) >>>>> if (aloop != NULL >>>>> && aloop->inner == NULL >>>>> - && aloop->num >= mn) >>>>> + && aloop->num != mn) >>>>> this->to_visit.quick_push (aloop->num); >>>>> >>>>> could elide the aloop->num != mn check and start iterating from 1, >>>>> since loops->tree_root->num == 0 >>>>> >>>>> and the walk_loop_tree could simply do >>>>> >>>>> class loop *exclude = flags & LI_INCLUDE_ROOT ? NULL : root; >>>>> >>>>> and pointer test aloop against exclude. That avoids the idea that >>>>> 'mn' is a vehicle to exclude one random loop from the iteration. >>>>> >>>> >>>> Good idea! Thanks for the comments! The attached v3 has addressed >>>> the review comments on "mn". >>>> >>>> Bootstrapped & regtested again on powerpc64le-linux-gnu Power9 >>>> (with/without the hunk for LI_ONLY_INNERMOST linear search). >>>> >>>> Is it ok for trunk? >>> >>> + /* Early handle root without any inner loops, make later >>> + processing simpler, that is all loops processed in the >>> + following while loop are impossible to be root. */ >>> + if (!root->inner) >>> + { >>> + if (root != exclude) >>> + this->to_visit.quick_push (root->num); >>> + return; >>> + } >>> >>> could be >>> >>> if (!root->inner) >>> { >>> if (flags & LI_INCLUDE_ROOT) >>> this->to_visit.quick_push (root->num); >>> } >>> >> >> OK, I thought wrongly that all places with "exclude" might be >> more consistent, so gave up to use flags directly. :) >> >>> + class loop *aloop; >>> + for (aloop = root; >>> + aloop->inner != NULL; >>> + aloop = aloop->inner) >>> + { >>> + if (preorder_p && aloop != exclude) >>> + this->to_visit.quick_push (aloop->num); >>> + continue; >>> + } >>> >>> could be >>> >>> + class loop *aloop; >>> + for (aloop = root->inner; >>> + aloop->inner != NULL; >>> + aloop = aloop->inner) >>> + { >>> + if (preorder_p) >>> + this->to_visit.quick_push (aloop->num); >>> + continue; >>> + } >>> >> >> This seems wrong? For preorder_p, we might miss to push root >> when root->inner isn't NULL. The below "else if" makes it safe. > > oops, yes. > >> @@ -2125,17 +2125,19 @@ loops_list::walk_loop_tree (class loop *root, >> unsigned flags) >> following while loop are impossible to be root. */ >> if (!root->inner) >> { >> - if (root != exclude) >> + if (flags & LI_INCLUDE_ROOT) >> this->to_visit.quick_push (root->num); >> return; >> } >> + else if (preorder_p && flags & LI_INCLUDE_ROOT) >> + this->to_visit.quick_push (root->num); >> >>> + /* When visiting from innermost, we need to consider root here >>> + since the previous while loop doesn't handle it. */ >>> + if (from_innermost_p && root != exclude) >>> + this->to_visit.quick_push (root->num); >>> >>> could be like the first. I think that's more clear even. Sorry for >>> finding a better solution again. >>> >> >> It's totally fine, thanks for all the nice suggestions! :) >> >>> OK with that change >>> >> >> Thanks, the attached diff is the delta against v3, excepting for >> the "else if", the other changes follow the suggestion above. >> >> Could you have another look to confirm? > > I'm missing the line that removes 'exclude', other than that it looks > OK. >
Thanks! Bootstrapped & regress-tested on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. Committed in r12-2756. BR, Kewen