On 5/5/23 12:59, Richard Sandiford wrote:
This patch follows on from g:9f635bd13fe9e85872e441b6f3618947f989909a
("the previous patch").  To start by quoting that:

If an insn requires two operands to be tied, and the input operand dies
in the insn, IRA acts as though there were a copy from the input to the
output with the same execution frequency as the insn.  Allocating the
same register to the input and the output then saves the cost of a move.

If there is no such tie, but an input operand nevertheless dies
in the insn, IRA creates a similar move, but with an eighth of the
frequency.  This helps to ensure that chains of instructions reuse
registers in a natural way, rather than using arbitrarily different
registers for no reason.

This heuristic seems to work well in the vast majority of cases.
However, the problem fixed in the previous patch was that we
could create a copy for an operand pair even if, for all relevant
alternatives, the output and input register classes did not have
any registers in common.  It is then impossible for the output
operand to reuse the dying input register.

This left unfixed a further case where copies don't make sense:
there is no point trying to reuse the dying input register if,
for all relevant alternatives, the output is earlyclobbered and
the input doesn't match the output.  (Matched earlyclobbers are fine.)

Handling that case fixes several existing XFAILs and helps with
a follow-on aarch64 patch.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  A SPEC2017 run
on aarch64 showed no differences outside the noise.  Also, I tried
compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target
per cpu directory, using the options -Os -fno-schedule-insns{,2}.
The results below summarise the tests that showed a difference in LOC:

Target               Tests   Good    Bad   Delta    Best   Worst  Median
======               =====   ====    ===   =====    ====   =====  ======
amdgcn-amdhsa           14      7      7       3     -18      10      -1
arm-linux-gnueabihf     16     15      1     -22      -4       2      -1
csky-elf                 6      6      0     -21      -6      -2      -4
hppa64-hp-hpux11.23      5      5      0      -7      -2      -1      -1
ia64-linux-gnu          16     16      0     -70     -15      -1      -3
m32r-elf                53      1     52      64      -2       8       1
mcore-elf                2      2      0      -8      -6      -2      -6
microblaze-elf         285    283      2    -909     -68       4      -1
mmix                     7      7      0   -2101   -2091      -1      -1
msp430-elf               1      1      0      -4      -4      -4      -4
pru-elf                  8      6      2     -12      -6       2      -2
rx-elf                  22     18      4     -40      -5       6      -2
sparc-linux-gnu         15     14      1     -40      -8       1      -2
sparc-wrs-vxworks       15     14      1     -40      -8       1      -2
visium-elf               2      1      1       0      -2       2      -2
xstormy16-elf            1      1      0      -2      -2      -2      -2

with other targets showing no sensitivity to the patch.  The only
target that seems to be negatively affected is m32r-elf; otherwise
the patch seems like an extremely minor but still clear improvement.

OK to install?

Yes, Richard.

Thank you for measuring the patch effect.  I wish other people would do the same for patches affecting generated code performance.

Reply via email to