On Sun, 17 Nov 2024, Jan Hubicka wrote:
> Last year I made modref to track which functions are deterministic - i.e. they
> produce same effects given same inputs (including memory) and which functions
> have no side effects (which includes infinite loops, trapping etc.).
>
> deterministic functions can be CSEed just as looping pure/const.
> These are similar to reproducible/unsequenced attributes.
>
> This patch implements tree-ssa-sccvn bits so we can now optimize non-pure
> function. For example:
>
> [[gnu::noinline]]
> int dup(int *a)
> {
> a[0]=a[1];
> return a[2];
> }
> int
> test (int *b)
> {
> return dup (b) + dup (b);
> }
>
> Will now be optimized to return 2 * dup (b);
>
> The sccvn bits are quite simple: visit_stmt has to allow VN lookups for
> deterministic functions while visit_reference_op_call currently contains test
> to give up if function has vdef or no lhs. I think the vdef test was
> redundant
> even before since pure/const function can return aggregate. Newly even if
> there is
> no lhs we can eliminate redundancies as in:
>
> [[gnu::noinline]]
> void dup(int *a)
> {
> a[0]=a[1];
> }
> void
> test (int *b)
> {
> dup (b);
> dup (b);
> }
>
> I think it is relatively useful feature to optimize out functions that
> compute some fields
> of structure based on other fields. It optimizes out 2.6K of clang text
> segment. This is not much, but I hope it will improve in future. For
> example,
> right now dup call can not do sanity checks. Call to abort makes us believe
> that all global memory is used and that will prevent optimizations. However
> modref can keep track of the fact that global memory read only happens upon
> termination and ignore those accesses while determining redundancy...
>
> Also reproduce/unsequential makes it possible to ignore stores of function
> call itself
> to detemrine redundancy. I am not quite sure how to fit that into PRE.
>
> bootstrapped/regtested x86_64-linux. OK?
>
> gcc/ChangeLog:
>
> * ipa-modref.cc (get_modref_function_summary): Add extra fn paramaeter
> to pass value numbering info.
> (ipa_modref_can_remove_redundat_calls): New function.
> * ipa-modref.h (get_modref_function_summary): Update declaration.
> (ipa_modref_can_remove_redundat_calls): declare.
> * tree-ssa-sccvn.cc (visit_reference_op_call): Also try to optimize
> calls
> with vdef and no lhs.
> (visit_stmt): Use ipa_modref_can_remove_redundat_calls.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/modref-16.c: New test.
> * gcc.dg/tree-ssa/modref-17.c: New test.
>
> diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
> index 08a7740de94..50d18287219 100644
> --- a/gcc/ipa-modref.cc
> +++ b/gcc/ipa-modref.cc
> @@ -766,12 +766,14 @@ get_modref_function_summary (cgraph_node *func)
> /* Get function summary for CALL if it exists, return NULL otherwise.
> If non-null set interposed to indicate whether function may not
> bind to current def. In this case sometimes loads from function
> - needs to be ignored. */
> + needs to be ignored.
> + If FN is non-NULL, it specifies indirect call target (for example when
> known
> + to value numbering). */
>
> modref_summary *
> -get_modref_function_summary (gcall *call, bool *interposed)
> +get_modref_function_summary (gcall *call, bool *interposed, tree fn)
> {
> - tree callee = gimple_call_fndecl (call);
> + tree callee = fn ? fn : gimple_call_fndecl (call);
> if (!callee)
> return NULL;
> struct cgraph_node *node = cgraph_node::get (callee);
> @@ -5648,4 +5650,33 @@ ipa_modref_callee_reads_no_memory_p (gcall *call)
> return false;
> }
>
> +/* Return true if CALL can be optimized out provided that was called with
> + exactly same value number before.
> +
> + If FN is non-NULL, it specifies indirect call target (for example when
> known
> + to value numbering). */
> +
> +bool
> +ipa_modref_can_remove_redundat_calls (gcall *call, tree fn)
> +{
> + if (gimple_call_flags (call) & (ECF_CONST | ECF_PURE))
> + return true;
> + if (fn && (flags_from_decl_or_type (fn) & (ECF_CONST | ECF_PURE)))
> + return true;
> + /* If function has multiple ways to return, we would have to remember
> + the way it returned last time.
> + Terminating function execution (i.e. throwing external exception)
> + is OK. */
> + basic_block bb = gimple_bb (call);
> + if (call == gsi_stmt (gsi_last_bb (bb)) && !single_succ_p (bb))
> + return false;
> + modref_summary *sum = get_modref_function_summary (call, NULL, fn);
> + /* If function is deterministic, we can remove redundant invocations.
> + However, in current implementaiton, we must track that loads and stores
> do
> + not overlap. Special case the situation that function reads and writes
> + global memory and therefore we know nothing. */
> + return sum && !sum->nondeterministic
> + && (!sum->global_memory_read || !sum->global_memory_written);
> +}
> +
> #include "gt-ipa-modref.h"
> diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
> index a0eb63a0afa..5bb4172ef9e 100644
> --- a/gcc/ipa-modref.h
> +++ b/gcc/ipa-modref.h
> @@ -72,10 +72,12 @@ struct GTY(()) modref_summary
> };
>
> modref_summary *get_modref_function_summary (cgraph_node *func);
> -modref_summary *get_modref_function_summary (gcall *call, bool *interposed);
> +modref_summary *get_modref_function_summary (gcall *call, bool *interposed,
> + tree fn = NULL);
> void ipa_modref_cc_finalize ();
> void ipa_merge_modref_summary_after_inlining (cgraph_edge *e);
> bool ipa_modref_callee_reads_no_memory_p (gcall *call);
> +bool ipa_modref_can_remove_redundat_calls (gcall *call, tree fn = NULL);
>
> /* All flags that are implied by the ECF_CONST functions. */
> static const int implicit_const_eaf_flags
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-16.c
> b/gcc/testsuite/gcc.dg/tree-ssa/modref-16.c
> new file mode 100644
> index 00000000000..702349bbc02
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-16.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-release_ssa" } */
> +[[gnu::noinline]]
> +int dup(int *a)
> +{
> + a[0]=a[1];
> + return a[2];
> +}
> +int
> +test (int *b)
> +{
> + return dup (b) + dup (b);
> +}
> +/* { dg-final { scan-tree-dump-times "dup .b" 1 "release_ssa"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-17.c
> b/gcc/testsuite/gcc.dg/tree-ssa/modref-17.c
> new file mode 100644
> index 00000000000..703cb46fabb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-17.c
> @@ -0,0 +1,13 @@
> +/* { dg-options "-O2 -fdump-tree-release_ssa" } */
> +[[gnu::noinline]]
> +void dup(int *a)
> +{
> + a[0]=a[1];
> +}
> +void
> +test (int *b)
> +{
> + dup (b);
> + dup (b);
> +}
> +/* { dg-final { scan-tree-dump-times "dup .b" 1 "release_ssa"} } */
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index e93acb44200..6e31f8b3cfe 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -5567,8 +5567,6 @@ visit_reference_op_call (tree lhs, gcall *stmt)
> modref info to find a candidate to CSE to. */
> const unsigned accesses_limit = 8;
> if (!vnresult
> - && !vdef
> - && lhs
I don't think we handle
mem = foo ();
correctly in VN, the && lhs condition also guards this. Maybe
instead refactor this and the check a few lines above to check
(!lhs || TREE_CODE (lhs) == SSA_NAME)
? The VUSE->VDEF chain walking also doesn't consider the call
having memory side-effects since it effectively skips intermittent
uses. So I believe you have to adjust (or guard) that as well,
alternatively visit all uses for each def walked.
> && gimple_vuse (stmt)
> && (((summary = get_modref_function_summary (stmt, NULL))
> && !summary->global_memory_read
> @@ -6354,19 +6352,18 @@ visit_stmt (gimple *stmt, bool backedges_varying_p =
> false)
>
> /* Pick up flags from a devirtualization target. */
> tree fn = gimple_call_fn (stmt);
> - int extra_fnflags = 0;
> if (fn && TREE_CODE (fn) == SSA_NAME)
> {
> fn = SSA_VAL (fn);
> if (TREE_CODE (fn) == ADDR_EXPR
> && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL)
> - extra_fnflags = flags_from_decl_or_type (TREE_OPERAND (fn, 0));
> + fn = TREE_OPERAND (fn, 0);
> + else
> + fn = NULL;
> }
> - if ((/* Calls to the same function with the same vuse
> - and the same operands do not necessarily return the same
> - value, unless they're pure or const. */
> - ((gimple_call_flags (call_stmt) | extra_fnflags)
> - & (ECF_PURE | ECF_CONST))
> + else
> + fn = NULL;
> + if ((ipa_modref_can_remove_redundat_calls (call_stmt, fn)
> /* If calls have a vdef, subsequent calls won't have
> the same incoming vuse. So, if 2 calls with vdef have the
> same vuse, we know they're not subsequent.
With disregarding VDEF this comment is now wrong (it's directed at
tail-merging btw).
I'll note that elimination will only be able to "DCE" calls with a
LHS since "DCE" happens by replacing the RHS. That's also what the
&& lhs check is about - we don't do anything useful during elimination
when there's no LHS but the call itself is present in the expression
hash.
Richard.