On Thu, 2021-01-21 at 12:29 +0000, Richard Sandiford wrote:
> Given what you said in the other message about combine, I agree this
> is a reasonable workaround.  I don't know whether it's suitable for
> stage 4 or whether it would need to wait for stage 1.

Thanks for reviewing!  I've implemented your suggestions in the patch
below.

Regarding stage 4, this can be seen as a part of IBM Z

https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

regression fix - before moving long doubles to vector registers and
fixing up "f" constraints on RTL level, code generation for small
glibc functions like __ieee754_sqrtl has been fairly efficient.  Not
sure if that issue is big enough to justify this common code change at
this point, but still..



v2 -> v3: Added single_ebb_p, added paradoxical subreg check, fixed
formatting.  Bootstrapped and regtested on x86_64-redhat-linux,
pc64le-redhat-linux and s390x-redhat-linux.




Suppose we have:

    (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
    (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))

It is clearly profitable to propagate the first insn into the second
one and get:

    (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))

fwprop actually manages to perform this, but doesn't think the result is
worth it, which results in unnecessary store/load sequences on s390.
Improve the situation by classifying SUBREG -> MEM changes as
profitable.

gcc/ChangeLog:

2021-01-15  Ilya Leoshkevich  <i...@linux.ibm.com>

        * fwprop.c (fwprop_propagation::classify_result): Allow
        (subreg (mem)) simplifications.
---
 gcc/fwprop.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index eff8f7cc141..123cc228630 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -176,7 +176,7 @@ namespace
     static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
     static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-    fwprop_propagation (rtx_insn *, rtx, rtx);
+    fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
 
     bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
     bool folded_to_constants_p () const;
@@ -185,13 +185,20 @@ namespace
     bool check_mem (int, rtx) final override;
     void note_simplification (int, uint16_t, rtx, rtx) final override;
     uint16_t classify_result (rtx, rtx);
+
+  private:
+    const bool single_use_p;
+    const bool single_ebb_p;
   };
 }
 
 /* Prepare to replace FROM with TO in INSN.  */
 
-fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to)
-  : insn_propagation (insn, from, to)
+fwprop_propagation::fwprop_propagation (insn_info *use_insn,
+                                       insn_info *def_insn, rtx from, rtx to)
+  : insn_propagation (use_insn->rtl (), from, to),
+    single_use_p (def_insn->num_uses () == 1),
+    single_ebb_p (use_insn->ebb () == def_insn->ebb ())
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -262,6 +269,22 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx 
new_rtx)
       && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from)))
     return PROFITABLE;
 
+  /* Allow (subreg (mem)) -> (mem) simplifications with the following
+     exceptions:
+     1) Propagating (mem)s into multiple uses is not profitable.
+     2) Propagating (mem)s across EBBs may not be profitable if the source EBB
+       runs less frequently.
+     3) Propagating (mem)s into paradoxical (subreg)s is not profitable.
+     4) Creating new (mem/v)s is not correct, since DCE will not remove the old
+       ones.  */
+  if (single_use_p
+      && single_ebb_p
+      && SUBREG_P (old_rtx)
+      && !paradoxical_subreg_p (old_rtx)
+      && MEM_P (new_rtx)
+      && !MEM_VOLATILE_P (new_rtx))
+    return PROFITABLE;
+
   return 0;
 }
 
@@ -363,7 +386,7 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info 
*def_insn,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_insn, def_insn, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -426,7 +449,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
insn_change &use_change,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_insn, def_insn, dest, src);
   if (!prop.apply_to_pattern (loc))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
-- 
2.26.2

Reply via email to