https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81825
Bug ID: 81825
Summary: x86_64 stack realignment code is suboptimal
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: luto at kernel dot org
Target Milestone: ---
I compiled this:
void func()
{
int var __attribute__((aligned(32)));
asm volatile ("" :: "m" (var));
}
using gcc (GCC) 7.1.1 20170622 (Red Hat 7.1.1-3). I got (after stripping CFI
stuff):
func:
leaq 8(%rsp), %r10
andq $-32, %rsp
pushq -8(%r10)
pushq %rbp
movq %rsp, %rbp
pushq %r10
popq %r10
popq %rbp
leaq -8(%r10), %rsp
ret
I have three objections to this code.
1. The push and immediate pop of %r10 seems pointless. Maybe it's due to some
weird DWARF limitation? A register allocation limitation sounds more likely,
though.
2. The addressing modes used for r10 are suboptimal. Shouldn't the first
instruction be just movq %rsp, %r10? By my count, this would save 12 bytes of
text.
3. Couldn't the whole thing just be:
pushq %rbp
movq %rsp, %rbp
andq $-32, %rsp
function body here. rbp can't be used to locate stack variables, but rsp can.
leaveq
ret