https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

            Bug ID: 66890
           Summary: function splitting only works with profile feedback
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Consider this simple example:

volatile int count;

int main()
{
        int i;
        for (i = 0; i < 100000; i++) {
                if (i == 999)
                        count *= 2;
                count++;
        }
}

The default EQ is unlikely heuristic in predict.* predicts that the if (i ==
999) is unlikely. So the tracer moves the count *= 2 basic block out of line to
preserve instruction cache.

gcc50 -O2 -S thotcold.c

        movl    $1, %edx
        jmp     .L2
        .p2align 4,,10
        .p2align 3
.L4:
        addl    $1, %edx
.L2:
        cmpl    $1000, %edx
        movl    count(%rip), %eax
        je      .L6
        addl    $1, %eax
        cmpl    $100000, %edx
        movl    %eax, count(%rip)
        jne     .L4
        xorl    %eax, %eax
        ret
# out of line code
.L6:
        addl    %eax, %eax
        movl    %eax, count(%rip)
        movl    count(%rip), %eax
        addl    $1, %eax
        movl    %eax, count(%rip)
        jmp     .L4


Now if we enable -freorder-blocks-and-partition I would expect it to be also
put into .text.unlikely to given even better cache layout. But that's what is
not happening. It generates the same code.

Only when I use actual profile feedback and -freorder-blocks-and-partition the
code actually ends up being in a separate section

(it also unrolled the loop, so the code looks a bit different)

gcc -O2 -fprofile-generate -freorder-blocks-and-partition thotcold.c
./a.out 
gcc -O2 -fprofile-use -freorder-blocks-and-partition thotcold.c 
...
       .cfi_endproc
        .section        .text.unlikely
        .cfi_startproc
.L55:
        movl    count(%rip), %ecx
        addl    $1, %eax
        addl    $1, %ecx
        cmpl    $100000, %eax
        movl    %ecx, count(%rip)
        je      .L6
        cmpl    $1, %edx
        je      .L5
        cmpl    $2, %edx
        je      .L28
        cmpl    $3, %edx


-freorder-blocks-and-partition should already use the extra section even
without profile feedback. 

I tested some larger programs and without profile feedback the unlikely section
is always empty.

The heuristics in predict.* often work quite well and a lot of code would
benefit from moving cold code out of the way of the caches.

This would allow to use the option to improve frontend bound codes without
needing to do full profile feedback.

Reply via email to