https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121315
--- Comment #5 from Alex Coplan <acoplan at gcc dot gnu.org> ---
So if I artificially increase the cost of the ADDRESS_REG_REG case by 1 in
aarch64_address_cost, then we get the desired codegen:
.L3:
ldp q31, q30, [x2], 32
rev32 v31.16b, v31.16b
rev32 v30.16b, v30.16b
stp q31, q30, [x3], 32
cmp x2, x0
bne .L3
ret
so we could try and do this if the tuning says we should try and form LDP/STP,
but it's quite a big hammer, and will penalise cases where LDP/STP cannot be
formed, and reg+reg addressing would be beneficial.
What we really need is more information passed down from ivopts to the
address_cost hook, e.g. something that at least tells us whether the base has
multiple address uses in the loop.