The bug is triggered with -O2 -fprofile-use. test case, loop.cpp:
int fun_b(int hbs[], int num, void *obj) { int i; int s = 0; for (i = 0; i < num; i++) { if (obj != 0) { if ((int)obj - hbs[i] > 0) { s += hbs[i]; } } } return s; } int main () { int i; int s = 0; int hbs[100]; for (i = 0; i < 100; ++i) { hbs[i] = i * 2000 + 100000; } for (i = 0; i < 20; ++i) { s += fun_b (hbs, 100, &hbs[i]); } return s; } Profile the program. Apparently the loop inside fun_b() is hot. $arm-eabi-g++ loop.cpp -O2 -fprofile-use --save-temps -c -o loop.o We we see an empty loop (.L5) if obj==0, in function fun_b. _Z5fun_bPiiPv: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. cmp r1, #0 stmfd sp!, {r4, r5} mov r3, r0 ble .L57 cmp r2, #0 <--- "if (obj != 0)" is moved out of loop beq .L5 .... .L3: ldmfd sp!, {r4, r5} bx lr .L5: ;; if (obj == 0), empty loop add r2, r2, #1 ;; cmp r2, r1 ;; bne .L5 ;; .L57: mov r0, #0 b .L3 The empty loop (.L5) should have been eliminated. I have tested -O2 without -fprofile-use, where the empty loop is gone. I find that the root cause of the inefficiency of -O2 FDO is that during unswitch-loops, the simplification of loop conditions is missed when FDO is on. Let's say, Version A: "-O2 -funswitch-loops", which does right thing. Version B: "-O2 -fprofile-use". Version B generates an empty loop which should be eliminated. Before switch-loop pass, the loop (inner-most, hot) is loop { if (obj != 0) { ... } } Both version A and version B perform one pass of unswitch-loop on this loop body. In function tree_unswitch_single_loop(), after "nloop = tree_unswitch_loop (loop, bbs[i], cond)", the loop becomes if (obj != 0) { loop { <---- original copy of the loop if (obj != 0) { ... } } } else { loop { <----- "nloop": a new copy of the loop if (obj != 0) { ... } } } Then, right before the end of tree_unswitch_single_loop(), gcc recursively calls itself on modified loops. tree_unswitch_single_loop (nloop, num + 1); >From here, Version A and Version B starts to perform differently. For Version A ("-O2 -funswitch-loops"), gcc conditions looking for unswitch-loop opportunity in the new loop "nloop". It finds that the condition of the new loop can be simplified. Since obj is 0 when it comes to the new loop, gcc replaces obj by 0. Thus the loop becomes if (obj != 0) { loop { <---- original copy of the loop if (obj != 0) { ... } } } else { loop { <----- "nloop": a new copy of the loop if (0 != 0) { <--- obj is replaced by "0" ... } } } Therefore, in the TODO pass cleanup-cfg, the "nloop" is entirely removed. However, for Version B ("-O2 -fprofile-use"), gcc finds that the "nloop" is a cold loop, so it returns immediately, without checking if the condition can be simplified. Thus nloop is not cleaned up by the following cleanup-cfg pass and results in an empty loop. The problematic code in is unswitch_single_loop() in loop-unswitch.c. static void unswitch_single_loop(struct loop *loop, ...) { ... /* Do not unswitch in cold areas. */ if (optimize_loop_for_size_p (loop)) { dump return; } ... do { ... /* Check whether the result can be predicted. */ for (acond = cond_checked; acond; acond = XEXP (acond, 1)) simplify_using_condition (XEXP (acond, 0), &cond, NULL); ... } while (repeat); ... /* Unswitch the loop on this condition. */ nloop = unswitch_loop (loop, bbs[i], cond, cinsn); ... /* Invoke itself on modified loops. */ unswitch_single_loop (nloop, rconds, num + 1); unswitch_single_loop (loop, conds, num + 1); ... } To fix the empty loop problem, my thought is to propagate the conditions immediately after nloop is inserted. Any suggestion? Thanks, Jing -- Summary: Empty loop generated at unswitch-loops with -O2 - fprofile-use Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jingyu at google dot com GCC build triplet: X86_64-linux-gnu GCC host triplet: X86_64-linux-gnu GCC target triplet: arm-unknown-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42720