4.9 regression] unoptimal code for two simple loops

amker.cheng at gmail dot com Fri, 13 Dec 2013 02:24:17 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39838


bin.cheng <amker.cheng at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amker.cheng at gmail dot com

--- Comment #15 from bin.cheng <amker.cheng at gmail dot com> ---
The situation gets a little bit better on 4_9 trunk.  The Os assembly code on
cortex-m0 (thumb1 as reported) is like:
test:
    push    {r0, r1, r2, r4, r5, r6, r7, lr}
    mov    r6, r0
    mov    r4, #0
    str    r2, [sp, #4]
.L2:
    ldr    r2, [r6]
    cmp    r4, r2
    bge    .L7
    mov    r5, #0
    lsl    r7, r4, #2
    add    r2, r7, #4   <----move to before XXX 
    str    r2, [sp]     <----spill
.L3:
    ldr    r3, [sp, #4]
    cmp    r5, r3
    bge    .L8
    ldr    r3, [r6, #4]
    ldr    r2, [sp]     <----spill
    ldr    r0, [r3, r7]
    ldr    r1, [r3, r2] <----XXX
    bl    func
    add    r5, r5, #1
    b    .L3
.L8:
    add    r4, r4, #1
    b    .L2
.L7:
    @ sp needed
    pop    {r0, r1, r2, r4, r5, r6, r7, pc}
    .size    test, .-test

IVOPT chooses the original biv for all uses in outer loop, regression comes
from long live range of "r2" and the corresponding spill.
Then I realized that GCC IVOPT computes iv (for non-linear uses) at original
place, we may be able to teach IVOPT to compute the iv just before it's used in
order to shrink live range of iv.  The patch I had at
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00535.html is similar to this,
only it computes iv uses at appropriate place for outside loop iv uses.

But this idea won't help this specific case because LIM will hoist all the
computation to basic block .L2 after IVOPT.

[Bug middle-end/39838] [4.7/4.8/4.9 regression] unoptimal code for two simple loops

Reply via email to