Given the code below, GCC fails to optimize the tail call to memset into a jmp on x86_64-unknown-linux-gnu as of 4.0 or 4.1 mainline. Versions 3.4 and 3.3 perform the transformation so it is a regression. All GCC versions on x86_64 manage to optimize the call to my_memset so it may be related to builtin handling. FYI none of the above versions of gcc optimize either memset or my_memset on x86. So on x86 it's consistently failing, I'm not sure if that's intentional or not. But x86_64 is a regression.
Compile with -O2 -S: #include <stddef.h> extern void *memset (void *, int, size_t); extern void *my_memset (void *, int, size_t); void foo (void *to, size_t count) { memset (to, 0, count); } void bar (void *to, size_t count) { my_memset (to, 0, count); } -- Summary: [4.0,4.1 regression] GCC fails to optimize tail call to memset Product: gcc Version: 4.0.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P2 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ghazi at gcc dot gnu dot org CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21265