http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53639
Bug #: 53639 Summary: x86_64: redundant 64-bit operations on 32-bit integers Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: mi...@it.uu.se Created attachment 27605 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27605 test case > cat redundant-64-bit-ops.c struct tlb_entry { unsigned int tag; unsigned int va_pa_off; }; struct core { struct tlb_entry tlb[1 << 10]; unsigned char *hmem; }; unsigned char read_byte_slow(struct core *core, unsigned int va); static inline unsigned char read_byte(struct core *core, unsigned int va) { unsigned int vpn; struct tlb_entry *te; vpn = va >> 12; te = &core->tlb[vpn & ((1 << 10) - 1)]; if (__builtin_expect(te->tag != vpn, 0)) return read_byte_slow(core, va); return *(core->hmem + va + te->va_pa_off); } unsigned char foo(struct core *core, unsigned int *q) { return read_byte(core, *q); } > /tmp/objdir/gcc/xgcc -B/tmp/objdir/gcc -O3 -S redundant-64-bit-ops.c > cat redundant-64-bit-ops.s .file "redundant-64-bit-ops.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB1: .cfi_startproc movl (%rsi), %esi (G) movl %esi, %edx shrl $12, %edx (C) movq %rdx, %rcx (A) andl $1023, %ecx (B) leaq (%rdi,%rcx,8), %rcx cmpl (%rcx), %edx jne .L4 movl %esi, %eax (D) movl 4(%rcx), %edx addq 8192(%rdi), %rax (E) movzbl (%rax,%rdx), %eax (F) ret .L4: jmp read_byte_slow .cfi_endproc .LFE1: .size foo, .-foo .ident "GCC: (GNU) 4.8.0 20120610 (experimental)" .section .note.GNU-stack,"",@progbits 1. Instruction (A) does a 64-bit move, however instruction (B) shows that only the low 32 bits of the destination are live, and instruction (C) shows that the source is already in zero-extended form. Therefore instruction (A) should just be a 32-bit 'movl %edx, %ecx'. 2. Instruction (D) is either a zero-extension, or a redundant move due to poor register allocation. The destination does need to be in zero-extended form to work in instructions (E) and (F), but the source is already in zero-extended form since instruction (G), so %rax should have been replaced with %rsi in (E) and (F), and (D) should have been deleted. The above was with gcc-4.8-20120610, but gcc-4.7-20120605 has the same problem. gcc-4.6.3 has the first problem but not the second, so the likely path is one instruction shorter there. Unfortunately gcc-4.6.3 chose %eax as the destination for the first load from %rsi, forcing it to insert a compensation move back to %esi before the tailcall in the unlikely path.