On 6/20/24 7:34 AM, Richard Sandiford wrote:
This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.
I would expect this to fix a problem we've seen on RISC-V as well. Essentially we have A, B an C. We want to combine A->B and A->C generating B' and C' and eliminate A. This shows up in the xz loop.


.

On most targets, the pass is enabled by default at -O2 and above.
However, it has a tendency to undo x86's STV and RPAD passes,
by folding the more complex post-STV/RPAD form back into the
simpler pre-pass form.
IIRC the limited enablement was one of the things folks were unhappy about in the gcc-14 cycle. Good to see that addressed.



Also, running a pass after register allocation means that we can
now match define_insn_and_splits that were previously only matched
before register allocation.  This trips things like:

   (define_insn_and_split "..."
     [...pattern...]
     "...cond..."
     "#"
     "&& 1"
     [...pattern...]
     {
       ...unconditional use of gen_reg_rtx ()...;
     }

because matching and splitting after RA will call gen_reg_rtx when
pseudos are no longer allowed.  rs6000 has several instances of this.
Interesting. I suspect ppc won't be the only affected port. This is somewhat worrisome.


xtensa has a variation in which the split condition is:

     "&& can_create_pseudo_p ()"

The failure then is that, if we match after RA, we'll never be
able to split the instruction.

The patch therefore disables the pass by default on i386, rs6000
and xtensa.  Hopefully we can fix those ports later (if their
maintainers want).  It seems easier to add the pass first, though,
to make it easier to test any such fixes.
I suspect it'll be a "does this make code better on the port, then let's fix the port so it can be used consistently" kind of scenario. Given the data you've presented I strongly suspect it would make the code better on the xtensa, so hopefully Max will do the gruntwork on that one.



gcc/
        PR rtl-optimization/106594
        * Makefile.in (OBJS): Add late-combine.o.
        * common.opt (flate-combine-instructions): New option.
        * doc/invoke.texi: Document it.
        * opts.cc (default_options_table): Enable it by default at -O2
        and above.
        * tree-pass.h (make_pass_late_combine): Declare.
        * late-combine.cc: New file.
        * passes.def: Add two instances of late_combine.
        * config/i386/i386-options.cc (ix86_override_options_after_change):
        Disable late-combine by default.
        * config/rs6000/rs6000.cc (rs6000_option_override_internal): Likewise.
        * config/xtensa/xtensa.cc (xtensa_option_override): Likewise.

gcc/testsuite/
        PR rtl-optimization/106594
        * gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
        targets.
        * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
        * gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
        * gcc.target/aarch64/bitfield-bitint-abi-align16.c: Add
        -fno-late-combine-instructions.
        * gcc.target/aarch64/bitfield-bitint-abi-align8.c: Likewise.
        * gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
        * gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
        * gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
        * gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
        described in the comment.
        * gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
        * gcc.target/aarch64/pr106594_1.c: New test.
---


OK. Obviously we'll need to keep an eye on testing state after this patch. I do expect fallout from the splitter issue noted above, but IMHO those are port problems for the port maintainers to sort out.

Jeff

Reply via email to