http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39838
bin.cheng <amker.cheng at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amker.cheng at gmail dot com --- Comment #15 from bin.cheng <amker.cheng at gmail dot com> --- The situation gets a little bit better on 4_9 trunk. The Os assembly code on cortex-m0 (thumb1 as reported) is like: test: push {r0, r1, r2, r4, r5, r6, r7, lr} mov r6, r0 mov r4, #0 str r2, [sp, #4] .L2: ldr r2, [r6] cmp r4, r2 bge .L7 mov r5, #0 lsl r7, r4, #2 add r2, r7, #4 <----move to before XXX str r2, [sp] <----spill .L3: ldr r3, [sp, #4] cmp r5, r3 bge .L8 ldr r3, [r6, #4] ldr r2, [sp] <----spill ldr r0, [r3, r7] ldr r1, [r3, r2] <----XXX bl func add r5, r5, #1 b .L3 .L8: add r4, r4, #1 b .L2 .L7: @ sp needed pop {r0, r1, r2, r4, r5, r6, r7, pc} .size test, .-test IVOPT chooses the original biv for all uses in outer loop, regression comes from long live range of "r2" and the corresponding spill. Then I realized that GCC IVOPT computes iv (for non-linear uses) at original place, we may be able to teach IVOPT to compute the iv just before it's used in order to shrink live range of iv. The patch I had at http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00535.html is similar to this, only it computes iv uses at appropriate place for outside loop iv uses. But this idea won't help this specific case because LIM will hoist all the computation to basic block .L2 after IVOPT.