http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53227
Bug #: 53227 Summary: [4.8 Regression] FAIL: gcc.target/i386/movbe-2.c scan-assembler-times movbe[ \t] 4 Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Keywords: ra Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ubiz...@gmail.com CC: ber...@gcc.gnu.org, uweig...@gcc.gnu.org, vmaka...@gcc.gnu.org Target: i686 Split from PR 53176, that changed lower-subreg to not split subregs early on x86. Following testcase --cut here-- extern long long x; void foo (long long i) { x = __builtin_bswap64 (i); } long long bar () { return __builtin_bswap64 (x); } --cut here-- compiled with -O2 -mmovbe -m32 on x86 target triggers RA to allocate non-optimal registers for "foo" (and forcing reload), while it is able to allocate optimal regs for "bar" case: bar: movbe x+4, %eax movbe x, %edx ret The situation with foo: foo: pushl %ebx movl 8(%esp), %eax movl 12(%esp), %edx movl %eax, %ebx movl %edx, %ecx bswap %ebx bswap %ecx movl %ebx, x+4 movl %ecx, x popl %ebx ret Which is a noticeable regression from 4.7: foo: movbe 4(%esp), %eax movbe 8(%esp), %edx movl %eax, x+4 movl %edx, x ret Adding -mregparm=2 does not improve things: foo: pushl %ebx movl %edx, %ecx movl %eax, %ebx bswap %ecx bswap %ebx movl %ecx, x movl %ebx, x+4 popl %ebx ret while 4.7 generates: foo: movbe %edx, x movbe %eax, x+4 ret