https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048
--- Comment #11 from Jiong Wang <jiwang at gcc dot gnu.org> --- (In reply to Richard Henderson from comment #10) > Created attachment 37890 [details] > second patch > > Still going through full testing, but I wanted to post this > before the end of the day. > > This update includes a virt_or_elim_regno_p, as discussed in #c7/#c8. > > It also updates aarch64_legitimize_address to treat R0+R1+C as a special > case of R0+(R1*S)+C. All of the arguments wrt scaling apply to unscaled > indices as well. > > As a minor point, doing some of the expansion in a slightly different > order results in less garbage rtl being generated in the process. Richard, I just recalled the reassociation of constant offset with vritual frame pointer will increase register pressure, thus cause bad code generation under some situations. For example, the testcase given at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173#c8 void bar(int i) { char A[10]; char B[10]; char C[10]; g(A); g(B); g(C); f(A[i]); f(B[i]); f(C[i]); return; } Before your patch we are generating (-O2) === bar: stp x29, x30, [sp, -80]! add x29, sp, 0 add x1, x29, 80 str x19, [sp, 16] mov w19, w0 add x0, x29, 32 add x19, x1, x19, sxtw bl g add x0, x29, 48 bl g add x0, x29, 64 bl g ldrb w0, [x19, -48] bl f ldrb w0, [x19, -32] bl f ldrb w0, [x19, -16] bl f ldr x19, [sp, 16] ldp x29, x30, [sp], 80 ret After your patch, we are generating: === bar: stp x29, x30, [sp, -96]! add x29, sp, 0 stp x21, x22, [sp, 32] add x22, x29, 48 stp x19, x20, [sp, 16] mov w19, w0 mov x0, x22 add x21, x29, 64 add x20, x29, 80 bl g mov x0, x21 bl g mov x0, x20 bl g ldrb w0, [x22, w19, sxtw] bl f ldrb w0, [x21, w19, sxtw] bl f ldrb w0, [x20, w19, sxtw] bl f ldp x19, x20, [sp, 16] ldp x21, x22, [sp, 32] ldp x29, x30, [sp], 96 ret We are using more callee saved registers, thus extra stp/ldp generated. But we do will benefit from reassociation constant offset with virtual frame pointer if it's inside loop, because: * vfp + const_offset is loop invariant * the virtual reg elimination on vfp will eventually generate one extra instruction if it was not used with const_offset but another reg. Thus after this reassociation, rtl IVOPT can hoist it out of loop, and we will save two instructions in the loop. A fix was proposed for loop-invariant.c to only do such reshuffling for loop, see https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01253.html. That patch finally stopped because the issue PR62173 was fixed on tree level, and the pointer re-shuffling was considered to have hidding overflow risk though will be very rare.