Hi Richard,

I did some testing with the attached v2 that does not restrict to UNARY
anymore.  As feared ;) there is some more fallout that I'm detailing below.

On Power there is one guality fail (pr43051-1.c) that I would take
the liberty of ignoring for now.

On x86 there are four fails:

 - cond_op_addsubmuldiv__Float16-2.c: assembler error
   unsupported masking for `vmovsh'.  I guess that's a latent backend
   problem.

 - ifcvt-3.c, pr49095.c: Here we propagate into a compare.  Before, we had
   (cmp (reg/CC) 0) and now we have (cmp (plus (reg1 reg2) 0).
   That looks like a costing problem and can hopefully solveable by making
   the second compare more expensive, preventing the propagation.
   i386 costing (or every costing?) is brittle so that could well break other
   things. 

 - pr88873.c: This is interesting because even before this patch we
   propagated with different register classes (V2DF vs DI).  With the patch
   we check the register pressure, find the class NO_REGS for V2DF and
   abort (because the patch assumes NO_REGS = high pressure).  I'm thinking
   of keeping the old behavior for reg-reg propagations and only checking
   the pressure for more complex operations.

aarch64 has the most fails:

 - One guality fail (same as Power).
 - shrn-combine-[123].c as before.

 - A class of (hopefully, I only checked some) similar cases where we
   propagate an unspec_whilelo into an unspec_ptest.  Before we would only
   set a REG_EQUALS note.
   Before we managed to create a while_ultsivnx16bi_cc whereas now we have
   while_ultsivnx16bi and while_ultsivnx16bi_ptest that won't be combined.
   We create redundant whilelos and I'm not sure how to improve that. I
   guess a peephole is out of the question :)

 - pred-combine-and.c: Here the new propagation appears useful at first.
   We propagate a "vector mask and" into a while_ultsivnx4bi_ptest and the
   individual and registers remain live up to the propagation site (while
   being dead before the patch).
   With the registers dead, combine could create a single fcmgt before.
   Now it only manages a 2->2 combination because we still need the registers
   and end up with two fcmgts.
   The code is worse but this seems more bad luck than anything.

 - Addressing fails from before:  I looked into these and suspect all of
   them are a similar.
   What happens is that we have a poly_int offset that we shift, negate
   and then add to x0.  The result is used as load address.
   Before, we would pull (combine) the (plus x0 reg) into the load keeping
   the neg and shift.
   Now we propagate everything into a single (set (minus x0 offset)).
   The propagation itself seems worthwhile because we save one insn.
   However as we got rid of the base/offset split by lumping everything
   together, combine cannot pull the (plus) into the address load and
   we require an aarch64_split_add_offset.  This will emit the longer
   sequence of ashiftl and subtract.  The "base" address is x0 here so
   we cannot convert (minus x0 ...)) into neg.
   I didn't go through all of aarch64_split_add_offset.  I suppose we
   could re-add the separation of base/offset there but that might be
   a loss when the result is not used as an address. 
   
Again, all in all no fatal problems but pretty annoying :)  It's not much
but just gradually worse than with just UNARY.  Any idea on how/whether to
continue?

Regards
 Robin

gcc/ChangeLog:

        * fwprop.cc (fwprop_propagation::profitable_p): Add unary
        handling.
        (fwprop_propagation::update_register_pressure): New function.
        (fwprop_propagation::register_pressure_high_p): New function
        (reg_single_def_for_src_p): Look through unary expressions.
        (try_fwprop_subst_pattern): Check register pressure.
        (forward_propagate_into): Call new function.
        (fwprop_init): Init register pressure.
        (fwprop_done): Clean up register pressure.
        (fwprop_insn): Add comment.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c: New test.
---
 gcc/fwprop.cc                                 | 359 +++++++++++++++++-
 .../riscv/rvv/autovec/binop/vadd-vx-fwprop.c  |  64 ++++
 2 files changed, 419 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c

diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 0707a234726..ce6f5a74b00 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -36,6 +36,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "rtl-iter.h"
 #include "target.h"
+#include "dominance.h"
+
+#include "ira.h"
+#include "regpressure.h"
 
 /* This pass does simple forward propagation and simplification when an
    operand of an insn can only come from a single def.  This pass uses
@@ -103,6 +107,10 @@ using namespace rtl_ssa;
 
 static int num_changes;
 
+/* Keep track of which registers already increased the pressure to avoid double
+   booking.  */
+sbitmap pressure_accounted;
+
 /* Do not try to replace constant addresses or addresses of local and
    argument slots.  These MEM expressions are made only once and inserted
    in many instructions, as well as being used to control symbol table
@@ -181,6 +189,8 @@ namespace
     bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
     bool folded_to_constants_p () const;
     bool profitable_p () const;
+    bool register_pressure_high_p (rtx, rtx, rtx_insn *, rtx_insn *) const;
+    bool update_register_pressure (rtx, rtx, rtx_insn *, rtx_insn *) const;
 
     bool check_mem (int, rtx) final override;
     void note_simplification (int, uint16_t, rtx, rtx) final override;
@@ -340,18 +350,272 @@ fwprop_propagation::profitable_p () const
       && !paradoxical_subreg_p (to))
     return true;
 
-  if (CONSTANT_P (to))
+  /* Disallow propagations into debug insns when it is not a reg-reg
+     propagation.  Their costs are usually 0 and a cost comparison will
+     always be profitable no matter how much we complicate the pattern.  */
+  if (DEBUG_INSN_P (insn))
+    return false;
+
+  /* For more complex operations check each sub-rtx.  */
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, to, NONCONST)
+    {
+      const_rtx x = *iter;
+
+      if (REG_P (x))
+       continue;
+
+      else if (GET_CODE (x) == SUBREG
+         && REG_P (SUBREG_REG (x))
+         && !paradoxical_subreg_p (x))
+       continue;
+
+      else if (rtx_equal_p (x, to))
+       continue;
+
+      else
+       return false;
+    }
+
+  return true;
+}
+
+/* Check if the register pressure in any predecessor block of USE's block
+   until DEF's block is equal or higher than the number of hardregs in NU's
+   register class.  */
+bool
+fwprop_propagation::register_pressure_high_p (rtx nu, rtx old, rtx_insn *def,
+                                             rtx_insn *use) const
+{
+  enum reg_class nu_class, old_class;
+  int nu_nregs, old_nregs;
+  nu_class = regpressure_get_regno_pressure_class (REGNO (nu), &nu_nregs);
+  old_class
+    = regpressure_get_regno_pressure_class (REGNO (old), &old_nregs);
+
+  if (nu_class == NO_REGS && old_class == NO_REGS)
     return true;
 
-  return false;
This patch enables the forwarding of sources from more complex
operations like unary (i.e. vec_duplicate) or binary.  As this
potentially involves replacing a register with a register of a different
class the ira_hoist_pressure machinery is used to calculate the change
in register pressure.  If the propagation would increase the pressure
beyond the number of hard regs, we don't perform it.
+  if (nu_class == old_class)
+    return false;
+
+  basic_block bbfrom = BLOCK_FOR_INSN (def);
+  basic_block bbto = BLOCK_FOR_INSN (use);
+
+  basic_block bb;
+
+  sbitmap visited = sbitmap_alloc (last_basic_block_for_fn (cfun));
+  bitmap_clear (visited);
+  auto_vec<basic_block> q;
+  q.safe_push (bbto);
+
+  bool high = false;
+
+  while (!q.is_empty ())
+    {
+      bb = q.pop ();
+
+      if (bitmap_bit_p (visited, bb->index))
+       continue;
+
+      /* Nothing to do if the register to be replaced is not live
+        in this BB.  */
+      if (bb != bbfrom && !regpressure_is_live_in (bb, REGNO (old)))
+       continue;
+
+      /* Nothing to do if the replacement register is already live in
+        this BB.  */
+      if (regpressure_is_live_in (bb, REGNO (nu)))
+       continue;
+
+      bitmap_set_bit (visited, bb->index);
+
+      int press = regpressure_get (bb, nu_class);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+       fprintf (dump_file, "pressure for reg %d in bb %d: %d\n",
+                REGNO (nu), bb->index, press);
+
+      if (press + nu_nregs > ira_class_hard_regs_num[nu_class])
+       {
+         high = true;
+         break;
+       }
+
+      if (bb == bbfrom || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun))
+       break;
+
+      edge e;
+      edge_iterator ei;
+
+      FOR_EACH_EDGE (e, ei, bb->preds)
+       {
+         basic_block pred = e->src;
+         q.safe_push (pred);
+       }
+    }
+
+  sbitmap_free (visited);
+
+  return high;
 }
 
-/* Check that X has a single def.  */
+/* Update the register pressure when propagating from DEF to USE
+   replacing OLD with NU.  Return TRUE if the pressure was updated.  */
+bool
+fwprop_propagation::update_register_pressure (rtx nu, rtx old, rtx_insn *def,
+                                             rtx_insn *use) const
+{
+  basic_block bb;
+
+  enum reg_class nu_class, old_class;
+  int nu_nregs, old_nregs;
+  nu_class = regpressure_get_regno_pressure_class (REGNO (nu), &nu_nregs);
+  old_class = regpressure_get_regno_pressure_class (REGNO (old), &old_nregs);
+
+  bool changed = false;
+
+  if (nu_class == old_class)
+    return false;
+
+  basic_block bbfrom = BLOCK_FOR_INSN (def);
+  basic_block bbto = BLOCK_FOR_INSN (use);
+
+  sbitmap visited = sbitmap_alloc (last_basic_block_for_fn (cfun));
+  bitmap_clear (visited);
+  auto_vec<basic_block> q;
+  q.safe_push (bbto);
+
+  /* Precompute the users.  */
+  auto_vec<rtx_insn *> users;
+  df_ref duse;
+  bool should_decrease = true;
+  FOR_EACH_INSN_USE (duse, def)
+    {
+      rtx dreg = DF_REF_REAL_REG (duse);
+      if (REGNO (dreg) != REGNO (old))
+       continue;
+
+      df_ref op_ref = DF_REG_USE_CHAIN (REGNO (dreg));
+      for (; op_ref; op_ref = DF_REF_NEXT_REG (op_ref))
+       {
+         if (!DF_REF_INSN_INFO (op_ref))
+           continue;
+
+         rtx_insn *insn = DF_REF_INSN (op_ref);
+
+         /* If there are other users in BBTO, never decrease the
+            pressure.  */
+         if (BLOCK_FOR_INSN (insn) == bbto && insn != use)
+           {
+             should_decrease = false;
+             break;
+           }
+
+         users.safe_push (insn);
+       }
+    }
+
+  while (!q.is_empty ())
+    {
+      bb = q.pop ();
+
+      if (bitmap_bit_p (visited, bb->index))
+       continue;
+
+      /* Nothing to do if the register to be replaced is not live
+        in this BB.  */
+      if (bb != bbfrom && !regpressure_is_live_in (bb, REGNO (old)))
+       continue;
+
+      /* Nothing to do if the new register is already live in this BB.  */
+      if (regpressure_is_live_in (bb, REGNO (nu)))
+       continue;
+
+      bitmap_set_bit (visited, bb->index);
+
+      if (regpressure_is_live_in (bb, REGNO (old)))
+       {
+         if (dump_file && (dump_flags & TDF_DETAILS))
+           fprintf (dump_file, "increasing pressure for "
+                    "reg %d in bb %d by %d reg(s).\n",
+                    REGNO (nu), bb->index, nu_nregs);
+         regpressure_set_live_in (bb, REGNO (nu));
+         regpressure_increase (bb, nu_class, nu_nregs);
+         changed = true;
+
+         if (should_decrease)
+           {
+             rtx_insn *user;
+             int ix;
+             bool should_decrease_local = true;
+
+             /* If this BB dominates no BB where OLD is used (except BBTO)
+                the register pressure is decreased.  */
+             FOR_EACH_VEC_ELT(users, ix, user)
+               {
+                 basic_block ubb = BLOCK_FOR_INSN (user);
+                 if (ubb == bbto)
+                   continue;
+                 if (dominated_by_p (CDI_DOMINATORS, ubb, bb))
+                   {
+                     should_decrease_local = false;
+                     break;
+                   }
+               }
+
+             if (should_decrease_local)
+               {
+                 if (dump_file && (dump_flags & TDF_DETAILS))
+                   fprintf (dump_file, "decreasing pressure for "
+                            "reg %d in bb %d by %d reg(s).\n",
+                            REGNO (old), bb->index, old_nregs);
+                 regpressure_clear_live_in (bb, REGNO (old));
+                 regpressure_decrease (bb, old_class, old_nregs);
+               }
+           }
+       }
+
+      if (bb == bbfrom || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun))
+       break;
+
+      edge e;
+      edge_iterator ei;
+
+      FOR_EACH_EDGE (e, ei, bb->preds)
+       {
+         basic_block pred = e->src;
+         q.safe_push (pred);
+       }
+    }
+
+  sbitmap_free (visited);
+  return changed;
+}
+
+/* Check that X has a single def.  In case of more complex operations
+   check each sub-rtx.  */
 
 static bool
 reg_single_def_p (rtx x)
 {
-  return REG_P (x) && crtl->ssa->single_dominating_def (REGNO (x));
+  if (REG_P (x))
+    return crtl->ssa->single_dominating_def (REGNO (x));
+
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, x, ALL)
+    {
+      rtx sub = *iter;
+
+      if (SUBREG_P (x) && !contains_paradoxical_subreg_p (sub))
+       sub = SUBREG_REG (x);
+
+      bool ok = REG_P (sub) && crtl->ssa->single_dominating_def (REGNO (sub));
+
+      if (!ok)
+       return false;
+    }
+
+  return true;
 }
 
 /* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of
@@ -490,6 +754,81 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
insn_change &use_change,
          }
       }
 
+  /* Perform register-pressure checks for all sub-rtxs.
+     If FLAG_IRA_HOIST_PRESSURE is set only allow the propagation if the
+     register pressure is not high.
+     Otherwise only allow the propagation if DEF has only a single use.  */
+  if (ok)
+    {
+      rtx_insn *def_rtl = def_insn->rtl ();
+
+      subrtx_var_iterator::array_type array;
+      FOR_EACH_SUBRTX_VAR (iter, array, src, NONCONST)
+       {
+         rtx srcreg = *iter;
+         rtx dstreg = dest;
+
+         /* For register pressure we do not care about subregs so strip
+            them.  */
+         while (SUBREG_P (dstreg))
+           dstreg = SUBREG_REG (dstreg);
+
+         while (SUBREG_P (srcreg))
+           srcreg = SUBREG_REG (srcreg);
+
+         if (!REG_P (dstreg) || !REG_P (srcreg))
+           continue;
+
+         /* Nothing to do for the same class.  */
+         bool same_class = regpressure_same_class (dstreg, srcreg);
+         if (same_class)
+           {
+             if (flag_ira_hoist_pressure)
+               continue;
+             else
+               {
+                 /* Check if this is the only use.  If not do not
+                    continue.  */
+                 for (use_info *use : def->all_uses ())
+                   if (use->insn () != use_insn)
+                     return false;
+               }
+           }
+
+         /* Never allow different pressure classes when not considering
+            register pressure.  */
+         if (!same_class && !flag_ira_hoist_pressure)
+           return false;
+
+         if (!bitmap_bit_p (pressure_accounted, REGNO (srcreg)))
+           {
+             if (!prop.register_pressure_high_p (srcreg, dstreg, def_rtl,
+                                                 use_rtl))
+               {
+                 bool updated
+                   = prop.update_register_pressure (srcreg, dstreg, def_rtl,
+                                                    use_rtl);
+                 if (updated)
+                   bitmap_set_bit (pressure_accounted, REGNO (srcreg));
+               }
+             else
+               {
+                 if (dump_file && (dump_flags & TDF_DETAILS))
+                   fprintf (dump_file, "cannot propagate from insn %d into"
+                            " insn %d: %s\n", def_insn->uid (),
+                            use_insn->uid (),
+                            "would increase register pressure beyond number"
+                            " of hard regs");
+                 ok = false;
+               }
+           }
+         else
+           if (dump_file && (dump_flags & TDF_DETAILS))
+             fprintf (dump_file, "pressure for reg %d has already"
+                      " been accounted\n", REGNO (srcreg));
+       }
+    }
+
   if (!ok)
     {
       /* The pattern didn't match, but if all uses of SRC folded to
@@ -538,6 +877,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
insn_change &use_change,
   confirm_change_group ();
   crtl->ssa->change_insn (use_change);
   num_changes++;
+
   return true;
 }
 
@@ -890,6 +1230,10 @@ fwprop_init (void)
 
   df_analyze ();
   crtl->ssa = new rtl_ssa::function_info (cfun);
+
+  regpressure_init ();
+  pressure_accounted = sbitmap_alloc (DF_REG_SIZE (df));
+  bitmap_clear (pressure_accounted);
 }
 
 static void
@@ -910,6 +1254,9 @@ fwprop_done (void)
     fprintf (dump_file,
             "\nNumber of successful forward propagations: %d\n\n",
             num_changes);
+
+  regpressure_cleanup ();
+  sbitmap_free (pressure_accounted);
 }
 
 /* Try to optimize INSN, returning true if something changes.
@@ -919,6 +1266,10 @@ fwprop_done (void)
 static bool
 fwprop_insn (insn_info *insn, bool fwprop_addr_p)
 {
+  /* TODO: when rejecting propagations due to register pressure
+     we would actually like to try the propagation with most
+     potential = uses first instead of going through them sequentially.  */
+
   for (use_info *use : insn->uses ())
     {
       if (use->is_mem ())
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c
new file mode 100644
index 00000000000..0ed1bb80f73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -march=rv64gcv -mabi=lp64d 
--param=riscv-autovec-preference=fixed-vlmax -fdump-rtl-fwprop1-details" } */
+
+/* Check that we propagate from vec_duplicates into vadd.vv, turning it into
+   vadd.vx.  */
+
+#include <stdint-gcc.h>
+
+/* Define 26 variables so that we can perform two forward propagations until we
+   reach the hard reg limit of 28.  */
+uint64_t *restrict dst1;
+uint64_t *restrict dst2;
+uint64_t *restrict dst3;
+uint64_t *restrict dst4;
+uint64_t *restrict dst5;
+uint64_t *restrict dst6;
+uint64_t *restrict dst7;
+uint64_t *restrict dst8;
+uint64_t *restrict dst9;
+uint64_t *restrict dst10;
+uint64_t *restrict dst11;
+uint64_t *restrict a1;
+uint64_t *restrict a2;
+uint64_t *restrict a3;
+uint64_t *restrict a4;
+uint64_t *restrict a5;
+uint64_t *restrict a6;
+uint64_t *restrict a7;
+uint64_t *restrict a8;
+uint64_t *restrict a9;
+uint64_t *restrict a10;
+uint64_t *restrict a11;
+uint64_t b1;
+uint64_t b2;
+uint64_t b3;
+uint64_t b4;
+
+void foo (int n)
+{
+  __builtin_expect (n % 128, 0);
+  for (int i = 0; i < n; i++)
+    {
+      /* We expect b3 to be propagated into vadd here as well as in the other
+         7 occurences (without increasing the pressure every time).
+         Then b2 is still supposed to be propagated but b1 is not even though
+         it would be better to do so.  */
+      dst1[i] = a1[i] + b3;
+      dst2[i] = a2[i] + b3;
+      dst3[i] = a3[i] + b2;
+      dst4[i] = a4[i] + b1;
+      dst5[i] = a5[i] + b1;
+      dst6[i] = a6[i] + b3;
+      dst7[i] = a7[i] + b3;
+      dst8[i] = a8[i] + b3;
+      dst9[i] = a9[i] + b3;
+      dst10[i] = a10[i] + b3;
+      dst11[i] = a11[i] + b3;
+    }
+}
+
+/* { dg-final { scan-rtl-dump-times "increasing pressure for reg" 2 "fwprop1" 
} } */
+/* { dg-final { scan-rtl-dump-times "pressure for reg \[0-9\]+ has already 
been accounted" 7 "fwprop1" } } */
+/* { dg-final { scan-rtl-dump-times "would increase register pressure beyond 
number of hard regs" 2 "fwprop1" } } */
+/* { dg-final { scan-assembler-times "\tvadd\.vx" 9 } } */
-- 
2.41.0


Reply via email to