http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #2 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2012-11-19 12:06:20 UTC --- The patching compiler produces better binaries but we still have -6% performance degradation on corei7. The main cause of it it that LRA compiler generates spill of 'pure' byte 'g' whereas old compiler generates spill for 'm' that is negation of 'g': gcc wwithout LRA (assembly part the head of loop) .L7: movzbl 1(%edi), %edx leal 3(%edi), %ebp movzbl (%edi), %ebx movl %ebp, %edi notl %edx // perform negation on register movb %dl, 3(%esp) gcc with LRA .L7: movzbl (%edi), %ebx leal 3(%edi), %ebp movzbl 1(%edi), %ecx movl %ebp, %edi movzbl -1(%ebp), %edx notl %ebx notl %ecx movb %dl, (%esp) cmpb %cl, %bl notb (%esp) // perform nagation in memory i.e. wwe have redundant load and store form/to stack. I assume that this should be fixed also.