https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915
Bug ID: 61915 Summary: [AArch64] Default use of the LRA results in extra code size Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: e.menezes at samsung dot com The issue that I observed in code size due to the default use of the LRA results in the spilling of the FP register used to spill variables into, which increases code-size. For example, in Dhrystone, out of dhry_1.c I see sequences like this: ldr d9, [sp, 144] ... fmov x0, d9 bl printf ... fmov x0, d9 ... bl printf By disabling the LRA, the code is a tad leaner (2%): ldr x0, [sp, 144] ... bl printf ... ldr x0, [sp, 144] ... bl printf Moreover, is transferring registers between the GP and the FP register files always cheap? In some x86 processors this used to be accomplished internally through the load-store unit anyway (e.g., Opteron). How is this accomplished internally in A53 and A57? Is using the LRA by default clearly beneficial in other cases? At the Cauldron I mentioned some variables that could be rematerialized when needed instead of being spilled, but I could not reproduce that. I'll try some more to spot this behavior.