Re: Remove redundant calls to deterministic functions

Richard Biener Mon, 18 Nov 2024 01:39:33 -0800

On Sun, 17 Nov 2024, Jan Hubicka wrote:

> Last year I made modref to track which functions are deterministic - i.e. they
> produce same effects given same inputs (including memory) and which functions
> have no side effects (which includes infinite loops, trapping etc.).
> 
> deterministic functions can be CSEed just as looping pure/const.
> These are similar to reproducible/unsequenced attributes.
> 
> This patch implements tree-ssa-sccvn bits so we can now optimize non-pure
> function. For example:
> 
> [[gnu::noinline]]
> int dup(int *a) 
> {
>       a[0]=a[1];
>       return a[2];
> }
> int
> test (int *b)
> {
>       return dup (b) + dup (b);
> }
> 
> Will now be optimized to return 2 * dup (b);
> 
> The sccvn bits are quite simple: visit_stmt has to allow VN lookups for
> deterministic functions while visit_reference_op_call currently contains test
> to give up if function has vdef or no lhs.  I think the vdef test was 
> redundant
> even before since pure/const function can return aggregate.  Newly even if 
> there is
> no lhs we can eliminate redundancies as in:
> 
> [[gnu::noinline]]
> void dup(int *a) 
> {
>       a[0]=a[1];
> }
> void
> test (int *b)
> {
>       dup (b);
>         dup (b);
> }
> 
> I think it is relatively useful feature to optimize out functions that 
> compute some fields
> of structure based on other fields.  It optimizes out 2.6K of clang text
> segment.  This is not much, but I hope it will improve in future.  For 
> example,
> right now dup call can not do sanity checks.  Call to abort makes us believe
> that all global memory is used and that will prevent optimizations. However
> modref can keep track of the fact that global memory read only happens upon
> termination and ignore those accesses while determining redundancy...
> 
> Also reproduce/unsequential makes it possible to ignore stores of function 
> call itself
> to detemrine redundancy.  I am not quite sure how to fit that into PRE.
> 
> bootstrapped/regtested x86_64-linux. OK?
> 
> gcc/ChangeLog:
> 
>       * ipa-modref.cc (get_modref_function_summary): Add extra fn paramaeter
>       to pass value numbering info.
>       (ipa_modref_can_remove_redundat_calls): New function.
>       * ipa-modref.h (get_modref_function_summary): Update declaration.
>       (ipa_modref_can_remove_redundat_calls): declare.
>       * tree-ssa-sccvn.cc (visit_reference_op_call): Also try to optimize 
> calls
>       with vdef and no lhs.
>       (visit_stmt): Use ipa_modref_can_remove_redundat_calls.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/tree-ssa/modref-16.c: New test.
>       * gcc.dg/tree-ssa/modref-17.c: New test.
> 
> diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
> index 08a7740de94..50d18287219 100644
> --- a/gcc/ipa-modref.cc
> +++ b/gcc/ipa-modref.cc
> @@ -766,12 +766,14 @@ get_modref_function_summary (cgraph_node *func)
>  /* Get function summary for CALL if it exists, return NULL otherwise.
>     If non-null set interposed to indicate whether function may not
>     bind to current def.  In this case sometimes loads from function
> -   needs to be ignored.  */
> +   needs to be ignored.
> +   If FN is non-NULL, it specifies indirect call target (for example when 
> known
> +   to value numbering).  */
>  
>  modref_summary *
> -get_modref_function_summary (gcall *call, bool *interposed)
> +get_modref_function_summary (gcall *call, bool *interposed, tree fn)
>  {
> -  tree callee = gimple_call_fndecl (call);
> +  tree callee = fn ? fn : gimple_call_fndecl (call);
>    if (!callee)
>      return NULL;
>    struct cgraph_node *node = cgraph_node::get (callee);
> @@ -5648,4 +5650,33 @@ ipa_modref_callee_reads_no_memory_p (gcall *call)
>    return false;
>  }
>  
> +/* Return true if CALL can be optimized out provided that was called with
> +   exactly same value number before.
> +
> +   If FN is non-NULL, it specifies indirect call target (for example when 
> known
> +   to value numbering).  */
> +
> +bool
> +ipa_modref_can_remove_redundat_calls (gcall *call, tree fn)
> +{
> +  if (gimple_call_flags (call) & (ECF_CONST | ECF_PURE))
> +    return true;
> +  if (fn && (flags_from_decl_or_type (fn) & (ECF_CONST | ECF_PURE)))
> +    return true;
> +  /* If function has multiple ways to return, we would have to remember
> +     the way it returned last time.
> +     Terminating function execution (i.e. throwing external exception)
> +     is OK.  */
> +  basic_block bb = gimple_bb (call);
> +  if (call == gsi_stmt (gsi_last_bb (bb)) && !single_succ_p (bb))
> +    return false;
> +  modref_summary *sum = get_modref_function_summary (call, NULL, fn);
> +  /* If function is deterministic, we can remove redundant invocations.
> +     However, in current implementaiton, we must track that loads and stores 
> do
> +     not overlap.  Special case the situation that function reads and writes
> +     global memory and therefore we know nothing.  */
> +  return sum && !sum->nondeterministic
> +      && (!sum->global_memory_read || !sum->global_memory_written);
> +}
> +
>  #include "gt-ipa-modref.h"
> diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
> index a0eb63a0afa..5bb4172ef9e 100644
> --- a/gcc/ipa-modref.h
> +++ b/gcc/ipa-modref.h
> @@ -72,10 +72,12 @@ struct GTY(()) modref_summary
>  };
>  
>  modref_summary *get_modref_function_summary (cgraph_node *func);
> -modref_summary *get_modref_function_summary (gcall *call, bool *interposed);
> +modref_summary *get_modref_function_summary (gcall *call, bool *interposed,
> +                                          tree fn = NULL);
>  void ipa_modref_cc_finalize ();
>  void ipa_merge_modref_summary_after_inlining (cgraph_edge *e);
>  bool ipa_modref_callee_reads_no_memory_p (gcall *call);
> +bool ipa_modref_can_remove_redundat_calls (gcall *call, tree fn = NULL);
>  
>  /* All flags that are implied by the ECF_CONST functions.  */
>  static const int implicit_const_eaf_flags
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-16.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/modref-16.c
> new file mode 100644
> index 00000000000..702349bbc02
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-16.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-release_ssa"  } */
> +[[gnu::noinline]]
> +int dup(int *a) 
> +{
> +     a[0]=a[1];
> +     return a[2];
> +}
> +int
> +test (int *b)
> +{
> +     return dup (b) + dup (b);
> +}
> +/* { dg-final { scan-tree-dump-times "dup .b" 1 "release_ssa"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-17.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/modref-17.c
> new file mode 100644
> index 00000000000..703cb46fabb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-17.c
> @@ -0,0 +1,13 @@
> +/* { dg-options "-O2 -fdump-tree-release_ssa"  } */
> +[[gnu::noinline]]
> +void dup(int *a) 
> +{
> +     a[0]=a[1];
> +}
> +void
> +test (int *b)
> +{
> +     dup (b);
> +       dup (b);
> +}
> +/* { dg-final { scan-tree-dump-times "dup .b" 1 "release_ssa"} } */
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index e93acb44200..6e31f8b3cfe 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -5567,8 +5567,6 @@ visit_reference_op_call (tree lhs, gcall *stmt)
>       modref info to find a candidate to CSE to.  */
>    const unsigned accesses_limit = 8;
>    if (!vnresult
> -      && !vdef
> -      && lhs


I don't think we handle

 mem = foo ();

correctly in VN, the && lhs condition also guards this.  Maybe
instead refactor this and the check a few lines above to check
(!lhs || TREE_CODE (lhs) == SSA_NAME)

?  The VUSE->VDEF chain walking also doesn't consider the call
having memory side-effects since it effectively skips intermittent
uses.  So I believe you have to adjust (or guard) that as well,
alternatively visit all uses for each def walked.

>        && gimple_vuse (stmt)
>        && (((summary = get_modref_function_summary (stmt, NULL))
>          && !summary->global_memory_read
> @@ -6354,19 +6352,18 @@ visit_stmt (gimple *stmt, bool backedges_varying_p = 
> false)
>  
>        /* Pick up flags from a devirtualization target.  */
>        tree fn = gimple_call_fn (stmt);
> -      int extra_fnflags = 0;
>        if (fn && TREE_CODE (fn) == SSA_NAME)
>       {
>         fn = SSA_VAL (fn);
>         if (TREE_CODE (fn) == ADDR_EXPR
>             && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL)
> -         extra_fnflags = flags_from_decl_or_type (TREE_OPERAND (fn, 0));
> +         fn = TREE_OPERAND (fn, 0);
> +       else
> +         fn = NULL;
>       }
> -      if ((/* Calls to the same function with the same vuse
> -           and the same operands do not necessarily return the same
> -           value, unless they're pure or const.  */
> -        ((gimple_call_flags (call_stmt) | extra_fnflags)
> -         & (ECF_PURE | ECF_CONST))
> +      else
> +     fn = NULL;
> +      if ((ipa_modref_can_remove_redundat_calls (call_stmt, fn)
>          /* If calls have a vdef, subsequent calls won't have
>             the same incoming vuse.  So, if 2 calls with vdef have the
>             same vuse, we know they're not subsequent.

With disregarding VDEF this comment is now wrong (it's directed at
tail-merging btw).

I'll note that elimination will only be able to "DCE" calls with a
LHS since "DCE" happens by replacing the RHS.  That's also what the
&& lhs check is about - we don't do anything useful during elimination
when there's no LHS but the call itself is present in the expression
hash.

Richard.

Re: Remove redundant calls to deterministic functions

Reply via email to