[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

rth at gcc dot gnu.org Wed, 09 Mar 2016 06:40:43 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048


--- Comment #13 from Richard Henderson <rth at gcc dot gnu.org> ---
Created attachment 37911
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37911&action=edit
aggressive patch

Consider something like this, whereby we allow (sfp + scale + const)
as an address all the way until register allocation.  LRA already knows
how to decompose this address in order to make it become valid, so
for your bar example in #c11 we get

bar:
        stp     x29, x30, [sp, -80]!
        add     x29, sp, 0
        str     x19, [sp, 16]
        mov     w19, w0
        add     x0, x29, 32
        bl      g
        add     x0, x29, 48
        bl      g
        add     x0, x29, 64
        bl      g
        add     x0, x29, 32
        ldrb    w0, [x0, w19, sxtw]
        bl      f
        add     x0, x29, 48
        ldrb    w0, [x0, w19, sxtw]
        bl      f
        add     x0, x29, 64
        ldrb    w0, [x0, w19, sxtw]
        bl      f
        ldr     x19, [sp, 16]
        ldp     x29, x30, [sp], 80
        ret

So, three more instructions than trunk, no extra saved registers
like with the proposed patch.  The extra instructions are simply
a choice that LRA makes during decomposition.  If we look at a
different example,

void baz(int i, int j, int k)
{
  char A[10];
  g(A);
  h(A[i], A[j], A[k]);
} 

wherein the offsets are the same but the scale differs,

        add     x0, x29, 48
        ldrb    w2, [x0, w21, sxtw]
        ldrb    w1, [x0, w20, sxtw]
        ldrb    w0, [x0, w19, sxtw]
        bl      h

where post-reload-cse unifies the three x29+48 insns.
Compare that to trunk, which produces

        add     x0, x29, 64
        add     x21, x0, x21, sxtw
        add     x20, x0, x20, sxtw
        add     x19, x0, x19, sxtw
        ldrb    w2, [x21, -16]
        ldrb    w1, [x20, -16]
        ldrb    w0, [x19, -16]
        bl      h

At some point an AArch64 maintainer is going to have to decide
what to do with this PR.  If the answer is to defer all to gcc7,
then we should downgrade the priority to P4.

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

Reply via email to