[Bug rtl-optimization/69933] New: non-ideal branch layout for an early-out return

peter at cordes dot ca Tue, 23 Feb 2016 19:51:07 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69933


            Bug ID: 69933
           Summary: non-ideal branch layout for an early-out return
           Product: gcc
           Version: 5.3.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---

(just guessing about this being an RTL bug, please reassign if it's
target-specific or something else).

This simple linked-list traversal compiles to slightly bulkier code than it
needs to:

int traverse(struct foo_head *ph)
{
  int a = -1;
  struct foo *p, *pprev;
  pprev = p = ph->h;
  while (p != NULL) {
    pprev = p;
    p = p->n;
  }
  if (pprev)
    a = pprev->a;
  return a;
}

 (gcc 5.3.0 -O3 on godbolt: http://goo.gl/r8vb5L)

        movq    (%rdi), %rdx
        movl    $-1, %eax       ; only needs to happen in the early-out case
        testq   %rdx, %rdx
        jne     .L3             ; jne/ret or je / fall through would be better
        jmp     .L9
.L5:
        movq    %rax, %rdx
.L3:
        movq    (%rdx), %rax
        testq   %rax, %rax
        jne     .L5
        movl    8(%rdx), %eax
        ret
.L9:
               ; ARM / PPC gcc 4.8.2 put the a=-1 down here
        ret                     ; this is a rep ret without -mtune=intel


Clang 3.7 chooses a better layout with a je .early_out instead the jne / jmp. 
It arranges the loop so it can enter at the top.  It actually look pretty
optimal:

        movq    (%rdi), %rcx
        movl    $-1, %eax
        testq   %rcx, %rcx
        je      .LBB0_3
.LBB0_1:                                # %.lr.ph
        movq    %rcx, %rax
        movq    (%rax), %rcx
        testq   %rcx, %rcx
        jne     .LBB0_1
        movl    8(%rax), %eax
.LBB0_3:                                # %._crit_edge.thread
        retq

Getting the mov $-1 out of the common case would require a separate mov/ret
block after the normal ret, so it's a code-size tradeoff which isn't worth it,
because a mov-immediate is dirt cheap.

Anyway, there are a couple different ways to lay out the branches and the mov
$-1, %eax, but gcc's choice is in no way optimal. :(

[Bug rtl-optimization/69933] New: non-ideal branch layout for an early-out return

Reply via email to