https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915

            Bug ID: 61915
           Summary: [AArch64] Default use of the LRA results in extra code
                    size
           Product: gcc
           Version: 4.10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: e.menezes at samsung dot com

The issue that I observed in code size due to the default use of the LRA
results in the spilling of the FP register used to spill variables into, which
increases code-size.

For example, in Dhrystone, out of dhry_1.c I see sequences like this:

  ldr    d9, [sp, 144]
  ...
  fmov    x0, d9
  bl    printf
  ...
  fmov    x0, d9
  ...
  bl    printf

By disabling the LRA, the code is a tad leaner (2%):

  ldr    x0, [sp, 144]
  ...
  bl    printf
  ...
  ldr    x0, [sp, 144]
  ...
  bl    printf

Moreover, is transferring registers between the GP and the FP register files
always cheap?  In some x86 processors this used to be accomplished internally
through the load-store unit anyway (e.g., Opteron).  How is this accomplished
internally in A53 and A57?

Is using the LRA by default clearly beneficial in other cases?

At the Cauldron I mentioned some variables that could be rematerialized when
needed instead of being spilled, but I could not reproduce that.  I'll try some
more to spot this behavior.

Reply via email to