On 1/5/24 10:35, Richard Sandiford wrote:
Jeff Law <jeffreya...@gmail.com> writes:
On 10/24/23 12:49, Richard Sandiford wrote:
This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.
So have you done any investigation on cases caught by your new pass
between combine and split1 to characterize them?  In particular do they
point at solvable problems in combine?  Or do you forsee this subsuming
the old combiner pass at some point in the distant future?

Examples like the PR are the main motivation for the pre-RA pass.
There we had an extension that could be combined into an address,
but no longer was after GCC 13.

The PR itself could in principle be fixed in combine (various
approaches were suggested, but not accepted).  But the same problem
applies to multiple uses of extensions.  fwprop can't handle it because
individual propagations are not a win in isolation.  And combine has
a limit of combining 4 insns (with a maximum of 2 output insns, IIRC).
So I don't think either of the existing passes scale to the general case.
Oh, that discussion :(




rth and I sketched out an SSA based RTL combine at some point in the
deep past.  The key goal we were trying to achieve was combining across
blocks.  We didn't have a functioning RTL SSA form at the time, so it
never went to any implementation work.  It looks like yours would solve
the class of problems rth and I were considering.

Yeah, I do see some cases where combining across blocks helps.
The case above is one example of that.  Another is:
Great. The cases I think rth and I were looking at were inspired by a talk at one of the early Cauldrons -- whichever one we had in California. Someone did a talk on cross-block combinations, mostly motivated by missed combinations on ARM.



The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?
I'm going to adjust this slightly so that it's enabled across the board
and throw it into the tester tomorrow (tester is busy tonight).  Even if
we make it opt-in on a per-port basis, the alternate target testing does
seems to find stuff that needs fixing ;-)

Thanks!  As per our off-list discussion, the cris-elf failures showed
up a bug in the handling of call arguments.  Here's an updated version
with that fixed.
Perfect. I'll spin that in the tester overnight. Hopefully that fixes the other ports that failed to build (as opposed to the testsuite failures which I think you've covered as not an issue with the new combine pass, but which are instead port issues.





Yeah.  If I'd posted this earlier in stage 1 (rather than October),
I might have tried teaching shorten_branches how to handle this.
But it felt like it could be a bit of a rabbit hole at this stage.

Nothing jumps out at horribly wrong.  You might want/need to reject
frame related insns in optimizable_set, though I guess if the dwarf2
writer isn't complaining, then we haven't mucked things up too bad.

Ah, yeah, I wondered about that.  But I suppose most prologue insns
don't really create combination opportunities, since most of them
either set up the stack (which we wouldn't combine anyway) or sink
incoming data.  So if there cases where this is a problem, it might
unfortunately be a case of "see what breaks".
The other issue that's been in the back of my mind is costing. But I think the model here is combine without regards to cost.

Let's let the tester chew on the updated version overnight, see what else may pop out, but barring any big surprises, I think we're good to go.



jeff

Reply via email to