http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58622
Bug ID: 58622
Summary: With -fomit-frame-pointer, A64 does not generate
post-decrement stores
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: b.grayson at samsung dot com
Target: AArch64
Build: 4.9.0 20130602
In A64, if one compiles a simple program under -O3, one gets code like this:
int bar(int i);
int foo() { return bar(5)+4; }
A64 -O3 assembly:
foo:
stp x29, x30, [sp, -16]!
add x29, sp, 0
mov w0, 5
bl bar
add w0, w0, 4
ldp x29, x30, [sp], 16
ret
Note the use of update-form loads and stores for the SP.
But if one uses -O3 -fomit-frame-pointer, the following is obtained:
foo:
sub sp, sp, #16
mov w0, 5
str x30, [sp]
bl bar
add w0, w0, 4
ldr x30, [sp]
add sp, sp, 16
ret
The sub and str could be merged into str x30, [sp, #-16]!, and the ldr/add
could be merged into ldr x30, [sp], #16 (if I have my assembly correct), as
they were in the with-frame-pointer case. On some ARM implementations, the
updates are "for free", so one would get better performance with the merged
load/store instructions, not to mention better instruction-cache density.
Note that under A32, identical code (using update/post-decrement stores) is
generated regardless of omit-frame-pointer settings.