I am aware developers WONTFIX GCC being a pessimising compiler with respect to some global register variable issues: <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42596>
GCC is copying registers for no good reason whatsoever. Below is a very simple example where gcc 3.3.6 does a better job of optimising the code. Unnecessary copying of registers may also occur with local register variables. #include <stdint.h> register uint64_t global_flag_stack __asm__("rbx"); void push_flag_into_global_reg_var(uint64_t a, uint64_t b) { uint64_t flag = (a==b); global_flag_stack <<= 8; global_flag_stack |= flag; } uint64_t push_flag_into_local_var(uint64_t a, uint64_t b, uint64_t local_flag_stack) { uint64_t flag = (a==b); local_flag_stack <<= 8; return local_flag_stack | flag; } int main() { } gcc-3.3 (GCC) 3.3.6 (Debian 1:3.3.6-15): $ gcc-3.3 -Os flags.c && objdump -d -m i386:x86-64:intel a.out|less ... 0000000000400478 <push_flag_into_global_reg_var>: 400478: 31 c0 xor eax,eax 40047a: 48 39 f7 cmp rdi,rsi 40047d: 0f 94 c0 sete al 400480: 48 c1 e3 08 shl rbx,0x8 400484: 48 09 c3 or rbx,rax 400487: c3 ret 0000000000400488 <push_flag_into_local_var>: 400488: 31 c0 xor eax,eax 40048a: 48 39 f7 cmp rdi,rsi 40048d: 0f 94 c0 sete al 400490: 48 c1 e2 08 shl rdx,0x8 400494: 48 09 d0 or rax,rdx 400497: c3 ret ... gcc-4.1 (GCC) 4.1.3 20080704 (prerelease) (Debian 4.1.2-29): $ gcc-4.1 -Os flags.c && objdump -d -m i386:x86-64:intel a.out|less ... 0000000000400448 <push_flag_into_global_reg_var>: 400448: 48 89 da mov rdx,rbx 40044b: 31 c0 xor eax,eax 40044d: 48 c1 e2 08 shl rdx,0x8 400451: 48 39 f7 cmp rdi,rsi 400454: 0f 94 c0 sete al 400457: 48 89 d3 mov rbx,rdx 40045a: 48 09 c3 or rbx,rax 40045d: c3 ret 000000000040045e <push_flag_into_local_var>: 40045e: 48 c1 e2 08 shl rdx,0x8 400462: 31 c0 xor eax,eax 400464: 48 39 f7 cmp rdi,rsi 400467: 0f 94 c0 sete al 40046a: 48 09 d0 or rax,rdx 40046d: c3 ret ... gcc-4.5 (Debian 4.5.0-1) 4.5.0: $ gcc-4.5 -Os flags.c && objdump -d -m i386:x86-64:intel a.out|less ... 0000000000400494 <push_flag_into_global_reg_var>: 400494: 31 d2 xor edx,edx 400496: 48 39 f7 cmp rdi,rsi 400499: 48 89 d8 mov rax,rbx 40049c: 0f 94 c2 sete dl 40049f: 48 c1 e0 08 shl rax,0x8 4004a3: 48 89 d3 mov rbx,rdx 4004a6: 48 09 c3 or rbx,rax 4004a9: c3 ret 00000000004004aa <push_flag_into_local_var>: 4004aa: 48 89 d0 mov rax,rdx 4004ad: 31 d2 xor edx,edx 4004af: 48 c1 e0 08 shl rax,0x8 4004b3: 48 39 f7 cmp rdi,rsi 4004b6: 0f 94 c2 sete dl 4004b9: 48 09 d0 or rax,rdx 4004bc: c3 ret ... The object code that current GCC is generating is embarrassing compared with GCC 3.3.6. Is it also necessary to increase the code footprint of push_flag_into_local_var when optimising for size (-Os) when compared to gcc 3.3.6 and 4.1.3? -- Summary: Global Register variable pessimisation and regression Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: adam at consulting dot net dot nz http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281