On Thu, 12 Nov 2020, Martin Jambor wrote:

> Hi,
> 
> On Wed, Nov 11 2020, Richard Biener wrote:
> > On Mon, 9 Nov 2020, Martin Jambor wrote:
> >
> >> this patch modifies the loop invariant pass so that is can operate
> >> only on a single requested loop and its sub-loops and ignore the rest
> >> of the function, much like it currently ignores basic blocks that are
> >> not in any real loop.  It then invokes it from within the loop
> >> interchange pass when it successfully swaps two loops.  This avoids
> >> the non-LTO -Ofast run-time regressions of 410.bwaves and 503.bwaves_r
> >> (which are 19% and 15% faster than current master on an AMD zen2
> >> machine) while not introducing a full LIM pass into the pass pipeline.
> >> 
> >> I have not modified the LIM data structures, this means that it still
> >> contains vectors indexed by loop->num even though only a single loop
> >> nest is actually processed.  I also did not replace the uses of
> >> pre_and_rev_post_order_compute_fn with a function that would count a
> >> postorder only for a given loop.  I can of course do so if the
> >> approach is otherwise deemed viable.
> >> 
> >> The patch adds one additional global variable requested_loop to the
> >> pass and then at various places behaves differently when it is set.  I
> >> was considering storing the fake root loop into it for normal
> >> operation, but since this loop often requires special handling anyway,
> >> I came to the conclusion that the code would actually end up less
> >> straightforward.
> >> 
> >> I have bootstrapped and tested the patch on x86_64-linux and a very
> >> similar one on aarch64-linux.  I have also tested it by modifying the
> >> tree_ssa_lim function to run loop_invariant_motion_from_loop on each
> >> real outermost loop in a function and this variant also passed
> >> bootstrap and all tests, including dump scans, of all languages.
> >> 
> >> I have built the entire SPEC 2006 FPrate monitoring the activity of
> >> the LIM pass without and with the patch (on top of commit b642fca1c31
> >> with which 526.blender_r and 538.imagick_r seemed to be failing) and
> >> it only examined 0.2% more loops, 0.02% more BBs and even fewer
> >> percent of statements because it is invoked only in a rather special
> >> circumstance.  But the patch allows for more such need-based uses at
> >> hopefully reasonable cost.
> >> 
> >> Since I do not have much experience with loop optimizers, I expect
> >> that there will be requests to adjust the patch during the review.
> >> Still, it fixes a performance regression against GCC 9 and so I hope
> >> to address the concerns in time to get it into GCC 11.
> >> 
> 
> [...]
> 
> >
> > That said, in the way it's currently structured I think it's
> > "better" to export tree_ssa_lim () and call it from interchange
> > if any loop was interchanged (thus run a full pass but conditional
> > on interchange done).  You can make it cheaper by adding a flag
> > to tree_ssa_lim whether to do store-motion (I guess this might
> > be an interesting user-visible flag as well and a possibility
> > to make select lim passes cheaper via a pass flag) and not do
> > store-motion from the interchange call.  I think that's how we should
> > fix the regression, refactoring LIM properly requires more work
> > that doesn't seem to fit the stage1 deadline.
> >
> 
> So just like this?  Bootstrapped and tested on x86_64-linux and I have
> verified it fixes the bwaves reduction.

OK.

Thanks,
Richard.


> Thanks,
> 
> Martin
> 
> 
> 
> gcc/ChangeLog:
> 
> 2020-11-12  Martin Jambor  <mjam...@suse.cz>
> 
>       PR tree-optimization/94406
>       * tree-ssa-loop-im.c (tree_ssa_lim): Renamed to
>       loop_invariant_motion_in_fun, added a parameter to control store
>       motion.
>       (pass_lim::execute): Adjust call to tree_ssa_lim, now
>       loop_invariant_motion_in_fun.
>       * tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Declare.
>       * gimple-loop-interchange.cc (pass_linterchange::execute): Call
>       loop_invariant_motion_in_fun if any interchange has been done.
> ---
>  gcc/gimple-loop-interchange.cc |  9 +++++++--
>  gcc/tree-ssa-loop-im.c         | 12 +++++++-----
>  gcc/tree-ssa-loop-manip.h      |  2 +-
>  3 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
> index 1656004ecf0..a36dbb49b1f 100644
> --- a/gcc/gimple-loop-interchange.cc
> +++ b/gcc/gimple-loop-interchange.cc
> @@ -2085,8 +2085,13 @@ pass_linterchange::execute (function *fun)
>      }
>  
>    if (changed_p)
> -    scev_reset ();
> -  return changed_p ? (TODO_update_ssa_only_virtuals) : 0;
> +    {
> +      unsigned todo = TODO_update_ssa_only_virtuals;
> +      todo |= loop_invariant_motion_in_fun (cfun, false);
> +      scev_reset ();
> +      return todo;
> +    }
> +  return 0;
>  }
>  
>  } // anon namespace
> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> index 6bb07e133cd..3c7412737f0 100644
> --- a/gcc/tree-ssa-loop-im.c
> +++ b/gcc/tree-ssa-loop-im.c
> @@ -3089,10 +3089,11 @@ tree_ssa_lim_finalize (void)
>  }
>  
>  /* Moves invariants from loops.  Only "expensive" invariants are moved out --
> -   i.e. those that are likely to be win regardless of the register pressure. 
>  */
> +   i.e. those that are likely to be win regardless of the register pressure.
> +   Only perform store motion if STORE_MOTION is true.  */
>  
> -static unsigned int
> -tree_ssa_lim (function *fun)
> +unsigned int
> +loop_invariant_motion_in_fun (function *fun, bool store_motion)
>  {
>    unsigned int todo = 0;
>  
> @@ -3114,7 +3115,8 @@ tree_ssa_lim (function *fun)
>  
>    /* Execute store motion.  Force the necessary invariants to be moved
>       out of the loops as well.  */
> -  do_store_motion ();
> +  if (store_motion)
> +    do_store_motion ();
>  
>    free (rpo);
>    rpo = XNEWVEC (int, last_basic_block_for_fn (fun));
> @@ -3175,7 +3177,7 @@ pass_lim::execute (function *fun)
>  
>    if (number_of_loops (fun) <= 1)
>      return 0;
> -  unsigned int todo = tree_ssa_lim (fun);
> +  unsigned int todo = loop_invariant_motion_in_fun (fun, true);
>  
>    if (!in_loop_pipeline)
>      loop_optimizer_finalize ();
> diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> index e789e4fcb0b..d8e918ef7c9 100644
> --- a/gcc/tree-ssa-loop-manip.h
> +++ b/gcc/tree-ssa-loop-manip.h
> @@ -55,7 +55,7 @@ extern void tree_transform_and_unroll_loop (class loop *, 
> unsigned,
>  extern void tree_unroll_loop (class loop *, unsigned,
>                             edge, class tree_niter_desc *);
>  extern tree canonicalize_loop_ivs (class loop *, tree *, bool);
> -
> +extern unsigned int loop_invariant_motion_in_fun (function *, bool);
>  
>  
>  #endif /* GCC_TREE_SSA_LOOP_MANIP_H */
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

Reply via email to