Re: [PATCH] Add a late-combine pass [PR106594]

Jeff Law Tue, 02 Jan 2024 20:20:59 -0800



On 10/24/23 12:49, Richard Sandiford wrote:

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

So have you done any investigation on cases caught by your new passbetween combine and split1 to characterize them? In particular do theypoint at solvable problems in combine? Or do you forsee this subsumingthe old combiner pass at some point in the distant future?

rth and I sketched out an SSA based RTL combine at some point in thedeep past. The key goal we were trying to achieve was combining acrossblocks. We didn't have a functioning RTL SSA form at the time, so itnever went to any implementation work. It looks like yours would solvethe class of problems rth and I were considering.


[ ... ]


The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?

I'm going to adjust this slightly so that it's enabled across the boardand throw it into the tester tomorrow (tester is busy tonight). Even ifwe make it opt-in on a per-port basis, the alternate target testing doesseems to find stuff that needs fixing ;-)


Richard


gcc/
        PR rtl-optimization/106594
        * Makefile.in (OBJS): Add late-combine.o.
        * common.opt (flate-combine-instructions): New option.
        * doc/invoke.texi: Document it.
        * common/config/aarch64/aarch64-common.cc: Enable it by default
        at -O2 and above.
        * tree-pass.h (make_pass_late_combine): Declare.
        * late-combine.cc: New file.
        * passes.def: Add two instances of late_combine.

gcc/testsuite/
        PR rtl-optimization/106594
        * gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
        targets.
        * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
        * gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
        * gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
        * gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
        * gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
        * gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
        described in the comment.
        * gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
        * gcc.target/aarch64/pr106594_1.c: New test.


+
+      // Don't substitute into a non-local goto, since it can then be
+      // treated as a jump to local label, e.g. in shorten_branches.
+      // ??? But this shouldn't be necessary.
+      if (use_insn->is_jump ()
+         && find_reg_note (use_insn->rtl (), REG_NON_LOCAL_GOTO, NULL_RTX))
+       return false;

Agreed that this shouldn't be necessary. In fact, if you can substituteit's potentially very profitable. Then again, I doubt it happens allthat much in practice, particularly since I think gimple does squash outsome of these.

Nothing jumps out at horribly wrong. You might want/need to rejectframe related insns in optimizable_set, though I guess if the dwarf2writer isn't complaining, then we haven't mucked things up too bad.

Re: [PATCH] Add a late-combine pass [PR106594]

Reply via email to