4.9 regression] wrong code at -O3

bernd.edlinger at hotmail dot de Wed, 21 Aug 2013 14:50:57 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58143


--- Comment #9 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
(In reply to Jakub Jelinek from comment #8)
> That patch looks wrong, and would very likely penalize tons of code, this
> predicate is used in many places in the compiler and the operations don't
> trap.

yes, thanks, I agree.

This means then the "lim" pass (and probably others like "ifcvt" too)
will move code out of the inner loop, as long as it does not trap.
But this creates undefined results, and that should not be used by the
loop optimization to throw away the loop termination code.

In this case I'd say the only other simple solution will be to take
out the function infer_loop_bounds_from_signedness() completely
at tree-ssa-loop-niter.c, right?

To illustrate what this function can do here is another example:

loop.c:

extern int bar ();

int
foo ()
{
  int i, k;
  for (i=0; i<4; i++)
  {
     k=1000000000*i;
     if (bar ())
       break;
  }
  return k;
}

if you compile this function with -O3 the resulting code is
very surprising (with zero warnings):

foo:
.LFB0:
        .cfi_startproc
        subl    $12, %esp
        .cfi_def_cfa_offset 16
        call    bar
        testl   %eax, %eax
        jne     .L3
        call    bar
        testl   %eax, %eax
        .p2align 4,,4
        jne     .L4
        .p2align 4,,6
        call    bar
        movl    $2000000000, %eax
.L2:
        addl    $12, %esp
        .cfi_remember_state
        .cfi_def_cfa_offset 4
        ret
        .p2align 4,,7
        .p2align 3
.L3:
        .cfi_restore_state
        xorl    %eax, %eax
        jmp     .L2
        .p2align 4,,7
        .p2align 3
.L4:
        movl    $1000000000, %eax
        jmp     .L2


Due to the fact, that k will overflow at the forth iteration,
the loop is terminated at the third iteration!
The reasoning is that the only way to prevent the undefined
behaviour of k, one of the first tree invocations of bar must
terminate the loop, and thus the loop is only unrolled 3 times.
But if the loop is a bit more complex it will not be unrolled,
and in this case the normal loop termination conditin "i<4"
will not be used at all, resulting in an endless loop.

To prevent the loop unrolling I can add a printf:

loop.c:

extern int bar ();

int
foo ()
{
  int i, k;
  for (i=0; i<4; i++)
  {
     k=1000000000*i;
     __builtin_printf("loop %d\n", i);
     if (bar ())
       break;
  }
  return k;
}

Now this is an endless loop (bar always returns 0 but the
compiler does not know)!

foo:
.LFB0:
        .cfi_startproc
        pushl   %ebx
        .cfi_def_cfa_offset 8
        .cfi_offset 3, -8
        xorl    %ebx, %ebx
        subl    $24, %esp
        .cfi_def_cfa_offset 32
.L2:
        movl    %ebx, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        call    bar
        testl   %eax, %eax
        jne     .L6
        addl    $1, %ebx
        .p2align 4,,3
        jmp     .L2
        .p2align 4,,7
        .p2align 3
.L6:
        addl    $24, %esp
        .cfi_def_cfa_offset 8
        imull   $1000000000, %ebx, %eax
        popl    %ebx
        .cfi_restore 3
        .cfi_def_cfa_offset 4
        ret
        .cfi_endproc

[Bug tree-optimization/58143] [4.8/4.9 regression] wrong code at -O3

Reply via email to