On Thu, 12 Nov 2020, Martin Jambor wrote: > Hi, > > On Wed, Nov 11 2020, Richard Biener wrote: > > On Mon, 9 Nov 2020, Martin Jambor wrote: > > > >> this patch modifies the loop invariant pass so that is can operate > >> only on a single requested loop and its sub-loops and ignore the rest > >> of the function, much like it currently ignores basic blocks that are > >> not in any real loop. It then invokes it from within the loop > >> interchange pass when it successfully swaps two loops. This avoids > >> the non-LTO -Ofast run-time regressions of 410.bwaves and 503.bwaves_r > >> (which are 19% and 15% faster than current master on an AMD zen2 > >> machine) while not introducing a full LIM pass into the pass pipeline. > >> > >> I have not modified the LIM data structures, this means that it still > >> contains vectors indexed by loop->num even though only a single loop > >> nest is actually processed. I also did not replace the uses of > >> pre_and_rev_post_order_compute_fn with a function that would count a > >> postorder only for a given loop. I can of course do so if the > >> approach is otherwise deemed viable. > >> > >> The patch adds one additional global variable requested_loop to the > >> pass and then at various places behaves differently when it is set. I > >> was considering storing the fake root loop into it for normal > >> operation, but since this loop often requires special handling anyway, > >> I came to the conclusion that the code would actually end up less > >> straightforward. > >> > >> I have bootstrapped and tested the patch on x86_64-linux and a very > >> similar one on aarch64-linux. I have also tested it by modifying the > >> tree_ssa_lim function to run loop_invariant_motion_from_loop on each > >> real outermost loop in a function and this variant also passed > >> bootstrap and all tests, including dump scans, of all languages. > >> > >> I have built the entire SPEC 2006 FPrate monitoring the activity of > >> the LIM pass without and with the patch (on top of commit b642fca1c31 > >> with which 526.blender_r and 538.imagick_r seemed to be failing) and > >> it only examined 0.2% more loops, 0.02% more BBs and even fewer > >> percent of statements because it is invoked only in a rather special > >> circumstance. But the patch allows for more such need-based uses at > >> hopefully reasonable cost. > >> > >> Since I do not have much experience with loop optimizers, I expect > >> that there will be requests to adjust the patch during the review. > >> Still, it fixes a performance regression against GCC 9 and so I hope > >> to address the concerns in time to get it into GCC 11. > >> > > [...] > > > > > That said, in the way it's currently structured I think it's > > "better" to export tree_ssa_lim () and call it from interchange > > if any loop was interchanged (thus run a full pass but conditional > > on interchange done). You can make it cheaper by adding a flag > > to tree_ssa_lim whether to do store-motion (I guess this might > > be an interesting user-visible flag as well and a possibility > > to make select lim passes cheaper via a pass flag) and not do > > store-motion from the interchange call. I think that's how we should > > fix the regression, refactoring LIM properly requires more work > > that doesn't seem to fit the stage1 deadline. > > > > So just like this? Bootstrapped and tested on x86_64-linux and I have > verified it fixes the bwaves reduction.
OK. Thanks, Richard. > Thanks, > > Martin > > > > gcc/ChangeLog: > > 2020-11-12 Martin Jambor <mjam...@suse.cz> > > PR tree-optimization/94406 > * tree-ssa-loop-im.c (tree_ssa_lim): Renamed to > loop_invariant_motion_in_fun, added a parameter to control store > motion. > (pass_lim::execute): Adjust call to tree_ssa_lim, now > loop_invariant_motion_in_fun. > * tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Declare. > * gimple-loop-interchange.cc (pass_linterchange::execute): Call > loop_invariant_motion_in_fun if any interchange has been done. > --- > gcc/gimple-loop-interchange.cc | 9 +++++++-- > gcc/tree-ssa-loop-im.c | 12 +++++++----- > gcc/tree-ssa-loop-manip.h | 2 +- > 3 files changed, 15 insertions(+), 8 deletions(-) > > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc > index 1656004ecf0..a36dbb49b1f 100644 > --- a/gcc/gimple-loop-interchange.cc > +++ b/gcc/gimple-loop-interchange.cc > @@ -2085,8 +2085,13 @@ pass_linterchange::execute (function *fun) > } > > if (changed_p) > - scev_reset (); > - return changed_p ? (TODO_update_ssa_only_virtuals) : 0; > + { > + unsigned todo = TODO_update_ssa_only_virtuals; > + todo |= loop_invariant_motion_in_fun (cfun, false); > + scev_reset (); > + return todo; > + } > + return 0; > } > > } // anon namespace > diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c > index 6bb07e133cd..3c7412737f0 100644 > --- a/gcc/tree-ssa-loop-im.c > +++ b/gcc/tree-ssa-loop-im.c > @@ -3089,10 +3089,11 @@ tree_ssa_lim_finalize (void) > } > > /* Moves invariants from loops. Only "expensive" invariants are moved out -- > - i.e. those that are likely to be win regardless of the register pressure. > */ > + i.e. those that are likely to be win regardless of the register pressure. > + Only perform store motion if STORE_MOTION is true. */ > > -static unsigned int > -tree_ssa_lim (function *fun) > +unsigned int > +loop_invariant_motion_in_fun (function *fun, bool store_motion) > { > unsigned int todo = 0; > > @@ -3114,7 +3115,8 @@ tree_ssa_lim (function *fun) > > /* Execute store motion. Force the necessary invariants to be moved > out of the loops as well. */ > - do_store_motion (); > + if (store_motion) > + do_store_motion (); > > free (rpo); > rpo = XNEWVEC (int, last_basic_block_for_fn (fun)); > @@ -3175,7 +3177,7 @@ pass_lim::execute (function *fun) > > if (number_of_loops (fun) <= 1) > return 0; > - unsigned int todo = tree_ssa_lim (fun); > + unsigned int todo = loop_invariant_motion_in_fun (fun, true); > > if (!in_loop_pipeline) > loop_optimizer_finalize (); > diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h > index e789e4fcb0b..d8e918ef7c9 100644 > --- a/gcc/tree-ssa-loop-manip.h > +++ b/gcc/tree-ssa-loop-manip.h > @@ -55,7 +55,7 @@ extern void tree_transform_and_unroll_loop (class loop *, > unsigned, > extern void tree_unroll_loop (class loop *, unsigned, > edge, class tree_niter_desc *); > extern tree canonicalize_loop_ivs (class loop *, tree *, bool); > - > +extern unsigned int loop_invariant_motion_in_fun (function *, bool); > > > #endif /* GCC_TREE_SSA_LOOP_MANIP_H */ > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend