15 Regression] AArch64: LDP pass does not handle some structure copies

acoplan at gcc dot gnu.org via Gcc-bugs Fri, 05 Jul 2024 03:51:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991


--- Comment #4 from Alex Coplan <acoplan at gcc dot gnu.org> ---
So the following is enough to fix the missed ldp due to alias analysis:

diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
index 31d2c21c88f..ab49d955ccf 100644
--- a/gcc/pair-fusion.cc
+++ b/gcc/pair-fusion.cc
@@ -128,8 +128,12 @@ pair_fusion::run ()
   if (!track_loads_p () && !track_stores_p ())
     return;

+  init_alias_analysis ();
+
   for (auto bb : crtl->ssa->bbs ())
     process_block (bb);
+
+  end_alias_analysis ();
 }

 // State used by the pass for a given basic block.

that explains why sched1 was able to do the re-ordering but we weren't able to
do it in ldp_fusion1 (sched1 makes these calls).  Essentially this enables a
mini-pass that establishes register equivalences and allows the calls to
canon_rtx inside the alias machinery to re-write the memcpy accesses in terms
of the sfp for alias disambiguation purposes.  For the testcase in #c1:

--- without-patch.s     2024-07-05 11:33:57.395927975 +0100
+++ with-patch.s        2024-07-05 11:33:32.164155523 +0100
@@ -17,9 +17,8 @@
        bl      g
        add     x0, sp, 32
        ldp     q31, q30, [x19]
-       ldr     q29, [x19, 32]
        str     q31, [sp, 32]
-       ldr     q31, [x19, 48]
+       ldp     q29, q31, [x19, 32]
        stp     q30, q29, [x0, 16]
        str     q31, [x0, 48]
        bl      h

we still miss the stp in this case since the stores have different RTL bases
(sfp vs memcpy pseudo) and no MEM_EXPR information.  If we go ahead with the
above change then in theory we could also make use of this register equivalence
information during discovery (not just for alias analysis), allowing us to get
the remaining stp.

While the above patch seems to improve performance overall, there is one
workload with a significant compile-time regression which needs investigating.

There are also some codesize regressions which I think occur due to forming
more stack-based LDPs, but this scuppers the IRA REG_EQUIV optimization to
avoid spilling registers that were loaded from the stack.

So a bit more work needed before we can go ahead with this.

[Bug target/114991] [14/15 Regression] AArch64: LDP pass does not handle some structure copies

Reply via email to