https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048
--- Comment #9 from Richard Henderson <rth at gcc dot gnu.org> --- While I fully believe in CSE'ing "base + reg*scale" when talking about non-stack-based pointers, when it comes to stack-based data access I'm less certain about the proper approach. All things work out "best" when there's no (or little) offset applied during register elimination. When this can be true, all of the rtl optimizations see the final address and can do the right thing. This isn't easy to do for AArch64, however. So we need to accept that some amount of concession need be made so that it's not too difficult turn reg + scale + c1 + c2 into a final address without extra steps. We already special case the eliminable frame registers in aarch64_classify_address to allow arbitrary offset, and we're prepared to split to a proper offset during RA. It wouldn't be out of the question to allow "reg + scale + c" as well. We can probably come up with some good heuristics for splitting into a number of cases based on the generalized "((reg + hi_c) + scale) + lo_c". But the patch we take for stage4 must be less than the full solution.