[Bug target/47949] Missed optimization for -Os using xchg instead of mov.

svfuerst at gmail dot com Wed, 02 Mar 2011 13:51:26 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949


--- Comment #3 from Steven Fuerst <svfuerst at gmail dot com> 2011-03-02 
21:51:12 UTC ---
Having a quick look at generated code... it appears that this pattern doesn't
come up all that often.  However, there is one case where it does: the epilogue
of a function. i.e. gcc tends to generate code looking like:

movl    %ebp, %eax
movq    8(%rsp), %rbx
movq    16(%rsp), %rbp
movq    24(%rsp), %r12
movq    32(%rsp), %r13
addq    $40, %rsp
ret

Replacing the move to %eax with an exchange with %ebp is a win in this
particular case.  The extra cycle or two of latency that xchg takes doesn't
matter as the other moves and ret instruction overlap in execution with it. 
Benchmarking on an opteron in 64bit mode confirms this hypothesis even in the
degenerate case where no other moves exist:

foo1:
    mov %edi, %eax
    retq

foo2:
    xchg %eax, %edi
    retq

foo1 and foo2 take the same time to execute.

[Bug target/47949] Missed optimization for -Os using xchg instead of mov.

Reply via email to