On Thu, 5 Nov 2020, Jan Hubicka wrote:

> > 
> > On 10/27/20 3:01 AM, Richard Biener wrote:
> > > On Tue, 27 Oct 2020, Jan Hubicka wrote:
> > >
> > >>> On Mon, 26 Oct 2020, Jan Hubicka wrote:
> > >>>
> > >>>> Hi,
> > >>>> while looking for special cases of buitins I noticed that tree-ssa-ccp
> > >>>> can use EAF_RETURNS_ARG.  I wonder if same should be done by value
> > >>>> numbering and other propagators....
> > >>> The issue is that changing
> > >>>
> > >>>   q = memcpy (p, r);
> > >>>   .. use q ...
> > >>>
> > >>> to
> > >>>
> > >>>   memcpy (p, r);
> > >>>   .. use p ..
> > >>>
> > >>> is bad for RA so we generally do not want to copy-propagate
> > >>> EAF_RETURNS_ARG.  We eventually do want to optimize a following
> > >>>
> > >>>
> > >>>   if (q == p)
> > >>>
> > >>> of course.  And we eventually want to do the _reverse_ transform,
> > >>> replacing
> > >>>
> > >>>   memcpy (p, r)
> > >>>   .. use p ..
> > >>>
> > >>> with
> > >>>
> > >>>   tem = memcpy (p, r)
> > >>>   .. use tem ..
> > >>>
> > >>> ISTR playing with patches doing all of the above, would need to dig
> > >>> them out again.  There's also a PR about this I think.
> > >>>
> > >>> Bernd added some code to RTL call expansion, not sure exactly
> > >>> what it does...
> > >> It adds copy intstruction to call fusage, so RTL backend now about the
> > >> equivalence.
> > >> void *
> > >> test(void *a, void *b, int l)
> > >> {
> > >>   __builtin_memcpy (a,b,l);
> > >>   return a;
> > >> }
> > >> eliminates the extra copy. So I would say that we should not be affraid
> > >> to propagate in gimple world. It is a minor thing I guess though.
> > >> (my interest is mostly to get rid of unnecesary special casing of
> > >> builtins, as these special cases are clearly not well maintained
> > >> because almost no one knows about them:)
> > > The complication is when this appears in a loop like
> > >
> > >  for (; n; --n)
> > >    {
> > >      p = memcpy (p, s, k);
> > >      p += j;
> > >    }
> > >
> > > then I assume IVOPTs can do a better job knowing the equivalence
> > > (guess we'd still need to teach SCEV about this then ...) and
> > > when it's not present explicitely in the SSA chain any SSA based
> > > analysis has difficulties seeing it.
> > >
> > > ISTR I saw regressions when doing a patch propagating those
> > > equivalences.
> > 
> > SImilarly.? I don't remember the details, but definitely remember being
> > surprised that the propagation caused regressions and then chasing it
> > down to a bad interaction with the register allocator.
> 
> I wonder if it was before or after the code in calls.c adding
> CALL_FUSAGE was added.  It is probably not that important, but given
> that we have all infrastructure on place it seems pity to not use it.

The CALL_FUSAGE was a band-aid I think - RA still doesn't know how
to re-materialize regs from the return value in general.

Btw, below is some WIP patch teaching FRE about this but instead
of propagating out the return value (which would be simpler in the
patch) it tries to do the reverse and eliminate to the most
downstream call.  It has some issues with valueization at -O1
though so not all simplification opportunities are realized.

As said, propagating out the return value is easy but that loses
the LHS of the calls in the end which is probably why the
CALL_FUSAGE thing doesn't help.  If there's no return value
there's noting to use.

Richard.

>From 7e96e97c33a533b74dd9d16634bb7a8acce6ca19 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguent...@suse.de>
Date: Fri, 6 Nov 2020 10:47:00 +0100
Subject: [PATCH] teach FRE about ERF_RETURNS_ARG
To: gcc-patches@gcc.gnu.org

This teaches value-numbering about calls returning one of their
arguments.  Elimination makes sure to use the return value
downstream of calls.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-89.c | 16 ++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-90.c | 15 +++++++++
 gcc/tree-ssa-sccvn.c                       | 37 +++++++++++++++++-----
 3 files changed, 60 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-89.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-90.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-89.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-89.c
new file mode 100644
index 00000000000..6d62510ed9d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-89.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+void *x, *y;
+int foo (void *q, void *p)
+{
+  void *r = __builtin_memcpy (q, p, 17);
+  x = r;
+  y = q;
+  return r != q;
+}
+
+/* { dg-final { scan-tree-dump "x = r" "fre1" } } */
+/* { dg-final { scan-tree-dump "y = r" "fre1" } } */
+/* We fail simplifying r != q for the lattice.  */
+/* { dg-final { scan-tree-dump "return 0;" "fre1" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-90.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-90.c
new file mode 100644
index 00000000000..1913de9c8ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-90.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+void *x, *y;
+int foo (void *q, void *p)
+{
+  void *r = __builtin_memcpy (q, p, 17);
+  x = r;
+  y = q;
+  return r != q;
+}
+
+/* { dg-final { scan-tree-dump "x = r" "fre1" } } */
+/* { dg-final { scan-tree-dump "y = r" "fre1" } } */
+/* { dg-final { scan-tree-dump "return 0;" "fre1" } } */
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index c139adb6130..42928a6ec69 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -5493,7 +5493,21 @@ visit_stmt (gimple *stmt, bool backedges_varying_p = 
false)
                  && default_vn_walk_kind == VN_WALK)))
        changed = visit_reference_op_call (lhs, call_stmt);
       else
-       changed = defs_to_varying (call_stmt);
+       {
+         changed = false;
+         if (tree vdef = gimple_vdef (call_stmt))
+           changed |= set_ssa_val_to (vdef, vdef);
+         if (lhs)
+           {
+             int rf = gimple_call_return_flags (call_stmt);
+             if (rf & ERF_RETURNS_ARG)
+               changed |= visit_copy (lhs,
+                                      gimple_call_arg (call_stmt,
+                                                       rf & 
ERF_RETURN_ARG_MASK));
+             else
+               changed |= set_ssa_val_to (lhs, lhs);
+           }
+       }
     }
   else
     changed = defs_to_varying (stmt);
@@ -5733,10 +5747,12 @@ eliminate_dom_walker::eliminate_avail (basic_block, 
tree op)
   tree valnum = VN_INFO (op)->valnum;
   if (TREE_CODE (valnum) == SSA_NAME)
     {
-      if (SSA_NAME_IS_DEFAULT_DEF (valnum))
-       return valnum;
+      tree leader = NULL_TREE;
       if (avail.length () > SSA_NAME_VERSION (valnum))
-       return avail[SSA_NAME_VERSION (valnum)];
+       leader = avail[SSA_NAME_VERSION (valnum)];
+      if (!leader && SSA_NAME_IS_DEFAULT_DEF (valnum))
+       leader = valnum;
+      return leader;
     }
   else if (is_gimple_min_invariant (valnum))
     return valnum;
@@ -5855,7 +5871,7 @@ eliminate_dom_walker::eliminate_stmt (basic_block b, 
gimple_stmt_iterator *gsi)
   gimple *stmt = gsi_stmt (*gsi);
   tree lhs = gimple_get_lhs (stmt);
   if (lhs && TREE_CODE (lhs) == SSA_NAME
-      && !gimple_has_volatile_ops (stmt)
+      && !gimple_has_side_effects (stmt)
       /* See PR43491.  Do not replace a global register variable when
         it is a the RHS of an assignment.  Do replace local register
         variables since gcc does not guarantee a local variable will
@@ -6766,11 +6782,12 @@ rpo_elim::eliminate_avail (basic_block bb, tree op)
     return op;
   if (TREE_CODE (valnum) == SSA_NAME)
     {
+      tree fallback = NULL_TREE;
       if (SSA_NAME_IS_DEFAULT_DEF (valnum))
-       return valnum;
+       fallback = valnum;
       vn_avail *av = VN_INFO (valnum)->avail;
       if (!av)
-       return NULL_TREE;
+       return fallback;
       if (av->location == bb->index)
        /* On tramp3d 90% of the cases are here.  */
        return ssa_name (av->leader);
@@ -6797,7 +6814,7 @@ rpo_elim::eliminate_avail (basic_block bb, tree op)
                  && ! flow_bb_inside_loop_p (gimple_bb (SSA_NAME_DEF_STMT
                                                         (leader))->loop_father,
                                              bb))
-               return NULL_TREE;
+               return fallback;
              if (dump_file && (dump_flags & TDF_DETAILS))
                {
                  print_generic_expr (dump_file, leader);
@@ -6815,6 +6832,7 @@ rpo_elim::eliminate_avail (basic_block bb, tree op)
          av = av->next;
        }
       while (av);
+      return fallback;
     }
   else if (valnum != VN_TOP)
     /* valnum is is_gimple_min_invariant.  */
@@ -7231,6 +7249,9 @@ process_bb (rpo_elim &avail, basic_block bb,
       else
        /* If not eliminating, make all not already available defs
           available.  */
+       /* ???  This avoids pushing ERF_RETURNS_ARG defs but elimination
+          above does not so we get differences in valueization
+          and simplification -O1 vs. -O2.  */
        FOR_EACH_SSA_TREE_OPERAND (op, gsi_stmt (gsi), i, SSA_OP_DEF)
          if (! avail.eliminate_avail (bb, op))
            avail.eliminate_push_avail (bb, op);
-- 
2.26.2


Reply via email to