https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69933
Bug ID: 69933 Summary: non-ideal branch layout for an early-out return Product: gcc Version: 5.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- (just guessing about this being an RTL bug, please reassign if it's target-specific or something else). This simple linked-list traversal compiles to slightly bulkier code than it needs to: int traverse(struct foo_head *ph) { int a = -1; struct foo *p, *pprev; pprev = p = ph->h; while (p != NULL) { pprev = p; p = p->n; } if (pprev) a = pprev->a; return a; } (gcc 5.3.0 -O3 on godbolt: http://goo.gl/r8vb5L) movq (%rdi), %rdx movl $-1, %eax ; only needs to happen in the early-out case testq %rdx, %rdx jne .L3 ; jne/ret or je / fall through would be better jmp .L9 .L5: movq %rax, %rdx .L3: movq (%rdx), %rax testq %rax, %rax jne .L5 movl 8(%rdx), %eax ret .L9: ; ARM / PPC gcc 4.8.2 put the a=-1 down here ret ; this is a rep ret without -mtune=intel Clang 3.7 chooses a better layout with a je .early_out instead the jne / jmp. It arranges the loop so it can enter at the top. It actually look pretty optimal: movq (%rdi), %rcx movl $-1, %eax testq %rcx, %rcx je .LBB0_3 .LBB0_1: # %.lr.ph movq %rcx, %rax movq (%rax), %rcx testq %rcx, %rcx jne .LBB0_1 movl 8(%rax), %eax .LBB0_3: # %._crit_edge.thread retq Getting the mov $-1 out of the common case would require a separate mov/ret block after the normal ret, so it's a code-size tradeoff which isn't worth it, because a mov-immediate is dirt cheap. Anyway, there are a couple different ways to lay out the branches and the mov $-1, %eax, but gcc's choice is in no way optimal. :(