Hi!
On Mon, Jan 14, 2019 at 09:53:18PM +, Steve Ellcey wrote:
> I have a question about PR87763, these are aarch64 specific tests
> that are failing after r265398 (combine: Do not combine moves from hard
> registers).
>
> These tests are all failing when the assembler scan looks for
> specific instructions and these instructions are no longer being
> generated. In some cases the new code is no worse than the old code
> (just different) but in most cases the new code is a performance
> regression from the old code.
>
> Note that these tests are generally *very* small functions where the
> body of the function consists of only 1 to 4 instructions so if we
> do not combine instructions involving hard registers there isn't much,
> if any, combining that can be done.
That is why all such hard regs are copied to a new pseudo first now. The
pseudo can usually be combined in the same way as the hard reg could be
before. The extra copy is optimised away by register allocation, in those
cases where that is a good choice (and, alas, register allocation sometimes
makes bad decisions).
> In larger functions this probably
> would not be an issue and I think those cases are where the incentive
> for this patch came from. So my question is, what do we want to
> do about these failures?
Fix them :-)
Some are caused by deficiencies in the target code (or, things that were
not required before, but that now are needed).
Some are shortcomings in register allocation (or elsewhere; but in generic
code).
> Find a GCC patch to generate the better code? If it isn't done by
> combine, how would we do it? Peephole optimizations?
Sometimes that is needed, sure. If your ISA is more orthogonal you need
fewer peepholes, which is great. But most targets can use a few for good
profit.
> Modify the tests to pass with the current output? Which, in my
> opinion would make the tests of not much value.
Sometimes the tests _are_ not much value, aren't really testing what they
intended to test.
> Remove the tests? Tests that search for specific assembly language
> output are rather brittle to begin with and if they are no longer
> serving a purpose after the combine patch, maybe we don't need them.
>
> The tests in question are:
Ah, not too many, I'll look at them all. Please correct me where I make
mistakes, I'm no expert on aarch64.
> gcc.target/aarch64/combine_bfi_1.c
f1:
Trying 9, 8 -> 10:
9: r99:SI=r100:SI&0xffff
REG_DEAD r100:SI
8: r98:SI=r101:SI<<0x8&0x00
REG_DEAD r101:SI
10: r96:SI=r98:SI|r99:SI
REG_DEAD r99:SI
REG_DEAD r98:SI
Failed to match this instruction:
(set (reg:SI 96)
(ior:SI (and:SI (reg:SI 100)
(const_int -16776961 [0xffff]))
(and:SI (ashift:SI (reg:SI 101)
(const_int 8 [0x8]))
(const_int 16776960 [0x00]
Either you need a pattern to match things like this, or combine (or
simplify-rtx) should write is as an lhs zero_extract.
f2 is similar; f3,f4,f5 are similar and/or the test should allow bfxil as
well as bfi.
> gcc.target/aarch64/insv_1.c
This test tests that various combinations with constant integers generate
good code. For the first test, bfi1, we get
Trying 2, 7 -> 13:
2: r92:DI=r95:DI
REG_DEAD r95:DI
7: zero_extract(r92:DI,0x8,0)=r93:DI
REG_DEAD r93:DI
13: x0:DI=r92:DI
REG_DEAD r92:DI
Failed to match this instruction:
(set (reg/i:DI 0 x0)
(ior:DI (and:DI (reg:DI 95)
(const_int -256 [0xff00]))
(reg:DI 93)))
Successfully matched this instruction:
(set (reg/v:DI 92 [ aD.3347 ])
(and:DI (reg:DI 95)
(const_int -256 [0xff00])))
Successfully matched this instruction:
(set (reg/i:DI 0 x0)
(ior:DI (reg/v:DI 92 [ aD.3347 ])
(reg:DI 93)))
allowing combination of insns 2, 7 and 13
which is not what we want. A peephole2, or a define_split, or a
define_insn_and_split would fix this.
bfi2 is similar. movk I think the same as well. set0 and set1 are best
code already IU think.
> gcc.target/aarch64/lsl_asr_sbfiz.c
sbfiz32 (sbfiz64 is fine):
Trying 6 -> 7:
6: r94:SI=r95:SI<<0x1d
REG_DEAD r95:SI
7: r93:SI=r94:SI>>0xa
REG_DEAD r94:SI
Failed to match this instruction:
(set (reg:SI 93)
(ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 95) 0)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))
Say what? Everything was SI, where does that sign_extract:DI come from?
And why isn't it optimised back to SI?
(Please open a PR just for this one, if it isn't obviously a target thing
that causes it).
> gcc.target/aarch64/sve/tls_preserve_1.c
I get
foo:
.LFB0:
.cfi_startproc
stp x29, x30, [sp, -64]!
.cfi_def_cfa_offset 64
.cfi_offset 29, -64
.cfi_offset 30, -56
mrs x1, tpidr_el0
mov x29, sp
stp q0, q1, [sp, 16]
str