------- Comment #8 from ubizjak at gmail dot com 2007-04-04 09:21 ------- The difference is in CALLER_SAVE_PROFITALBLE condition. The pseudo that holds sum is referenced 6 times. When only one foo() is called, default CALLER_SAVE_PROFITABLE condition causes RA to allocate call-clobbered register (fp or xmm regs are all call-clobbered for x86 targets). When two calls to foo() are present, default heuristics
#define CALLER_SAVE_PROFITABLE(REFS, CALLS) (4 * (CALLS) < (REFS)) pushes pseudo to memory, as RA does not consider the fact that pseudo is used inside the loop. Default heuristics is _wrong_. When pseudo is accessed inside the loop, call-clobbered register should be allocated, no matter how much calls it crosses. This can be confirmed by changing "double" keyword to "int" in the example of comment #7. gcc now chooses ebx register (call-preserved) and loop compiles to expected thight sequence: test: pushl %ebp movl %esp, %ebp pushl %ebx subl $4, %esp movl data, %edx movl (%edx), %eax leal 123(%eax), %ebx movl $2, %eax .L2: addl -4(%edx,%eax,4), %ebx addl $1, %eax cmpl $5, %eax jne .L2 call foo call foo movl %ebx, %eax addl $4, %esp popl %ebx popl %ebp ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31396