I noticed in prologue/epilogue, GCC prefers to use MOVs followed by a SP adjustment instead of a sequence of pushes/pops. The preference to the MOVs are good for old CPU micro-architectures (before pentium-4, K10), because it breaks the data dependency. In modern micro-architecture, push/pop is implemented using a mechanism called stack engine. The data dependency is removed by the hardware, and push/pop becomes very cheap (1 uOp, 1 cycle latency), and they are smaller. There is no longer the need to avoid using them. This is also what ICC does.
The following patch fixed the problem. It passes bootstrap/regression test. OK to install? thanks, David Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 194324) +++ config/i386/i386.c (working copy) @@ -1919,10 +1919,10 @@ static unsigned int initial_ix86_tune_fe m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC, /* X86_TUNE_PROLOGUE_USING_MOVE */ - m_PPRO | m_CORE2I7 | m_ATOM | m_ATHLON_K8 | m_GENERIC, + m_PPRO | m_ATHLON_K8, /* X86_TUNE_EPILOGUE_USING_MOVE */ - m_PPRO | m_CORE2I7 | m_ATOM | m_ATHLON_K8 | m_GENERIC, + m_PPRO | m_ATHLON_K8, /* X86_TUNE_SHIFT1 */ ~m_486, 2012-12-08 Xinliang David Li <davi...@google.com> * config/i386/i386.c: Eanble push/pop in pro/epilogue for moderen CPUs.