On 10/24/23 12:49, Richard Sandiford wrote:
This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.
So have you done any investigation on cases caught by your new pass between combine and split1 to characterize them? In particular do they point at solvable problems in combine? Or do you forsee this subsuming the old combiner pass at some point in the distant future?

rth and I sketched out an SSA based RTL combine at some point in the deep past. The key goal we were trying to achieve was combining across blocks. We didn't have a functioning RTL SSA form at the time, so it never went to any implementation work. It looks like yours would solve the class of problems rth and I were considering.

[ ... ]


The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?
I'm going to adjust this slightly so that it's enabled across the board and throw it into the tester tomorrow (tester is busy tonight). Even if we make it opt-in on a per-port basis, the alternate target testing does seems to find stuff that needs fixing ;-)




Richard


gcc/
        PR rtl-optimization/106594
        * Makefile.in (OBJS): Add late-combine.o.
        * common.opt (flate-combine-instructions): New option.
        * doc/invoke.texi: Document it.
        * common/config/aarch64/aarch64-common.cc: Enable it by default
        at -O2 and above.
        * tree-pass.h (make_pass_late_combine): Declare.
        * late-combine.cc: New file.
        * passes.def: Add two instances of late_combine.

gcc/testsuite/
        PR rtl-optimization/106594
        * gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
        targets.
        * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
        * gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
        * gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
        * gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
        * gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
        * gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
        described in the comment.
        * gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
        * gcc.target/aarch64/pr106594_1.c: New test.



+
+      // Don't substitute into a non-local goto, since it can then be
+      // treated as a jump to local label, e.g. in shorten_branches.
+      // ??? But this shouldn't be necessary.
+      if (use_insn->is_jump ()
+         && find_reg_note (use_insn->rtl (), REG_NON_LOCAL_GOTO, NULL_RTX))
+       return false;
Agreed that this shouldn't be necessary. In fact, if you can substitute it's potentially very profitable. Then again, I doubt it happens all that much in practice, particularly since I think gimple does squash out some of these.

Nothing jumps out at horribly wrong. You might want/need to reject frame related insns in optimizable_set, though I guess if the dwarf2 writer isn't complaining, then we haven't mucked things up too bad.

Reply via email to