http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55162
Bug #: 55162 Summary: Loop ivopts cuts off top bits of loop counter Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: olege...@gcc.gnu.org Target: sh*-*-* The following function: int test (int* x, unsigned int c) { int s = 0; unsigned int i; for (i = 0; i < c; ++i) s += x[i]; return s; } compiled for SH (-O2 -m4 -ml) results in the following code: tst r5,r5 // c == 0 ? bt/s .L6 mov #0,r0 shll2 r5 // c <<= 2 add #-4,r5 // c += -4 shlr2 r5 // c >>= 2 (unsigned shift) add #1,r5 // c += 1 .L3: mov.l @r4+,r1 dt r5 bf/s .L3 add r1,r0 .L6: rts nop If the function above is invoked with c = 0x80000000 the loop will do 0x40000000 number of iterations, which looks suspicious. For example, passing a virtual address 0x00001000 and c = 0x80000000 to the function should actually run over the address range 0x00001000 .. 0x80001000, not 0x00001000 .. 0x40001000. I've also checked this on ARM. There, the loop counter is transformed into the end address and the loop compares the addresses instead of using a decrement-and-test insn: cmp r1, #0 beq .L4 mov r3, r0 add r1, r0, r1, asl #2 mov r0, #0 .L3: ldr r2, [r3], #4 cmp r3, r1 add r0, r0, r2 bne .L3 bx lr .L4: mov r0, r1 bx lr The same could be done on SH, too (comparing against the end address instead of using a loop counter), but it would add a loop setup overhead. In the optimal case the above function would result in the following SH code: tst r5,r5 bt/s .L6 mov #0,r0 .L3: mov.l @r4+,r1 dt r5 bf/s .L3 add r1,r0 .L6: rts nop This problem is present on rev 193061 as well as on the 4.7 branch.