https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048
--- Comment #13 from Richard Henderson <rth at gcc dot gnu.org> --- Created attachment 37911 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37911&action=edit aggressive patch Consider something like this, whereby we allow (sfp + scale + const) as an address all the way until register allocation. LRA already knows how to decompose this address in order to make it become valid, so for your bar example in #c11 we get bar: stp x29, x30, [sp, -80]! add x29, sp, 0 str x19, [sp, 16] mov w19, w0 add x0, x29, 32 bl g add x0, x29, 48 bl g add x0, x29, 64 bl g add x0, x29, 32 ldrb w0, [x0, w19, sxtw] bl f add x0, x29, 48 ldrb w0, [x0, w19, sxtw] bl f add x0, x29, 64 ldrb w0, [x0, w19, sxtw] bl f ldr x19, [sp, 16] ldp x29, x30, [sp], 80 ret So, three more instructions than trunk, no extra saved registers like with the proposed patch. The extra instructions are simply a choice that LRA makes during decomposition. If we look at a different example, void baz(int i, int j, int k) { char A[10]; g(A); h(A[i], A[j], A[k]); } wherein the offsets are the same but the scale differs, add x0, x29, 48 ldrb w2, [x0, w21, sxtw] ldrb w1, [x0, w20, sxtw] ldrb w0, [x0, w19, sxtw] bl h where post-reload-cse unifies the three x29+48 insns. Compare that to trunk, which produces add x0, x29, 64 add x21, x0, x21, sxtw add x20, x0, x20, sxtw add x19, x0, x19, sxtw ldrb w2, [x21, -16] ldrb w1, [x20, -16] ldrb w0, [x19, -16] bl h At some point an AArch64 maintainer is going to have to decide what to do with this PR. If the answer is to defer all to gcc7, then we should downgrade the priority to P4.