Compile the attached source code with options -march=armv7-a -mthumb -Os, gcc
generates:

foo2:
        push    {r4, r5, r6, r7, lr}
        sub     sp, sp, #20
        mov     r7, r2
        str     r0, [sp, #8]
        movs    r0, #64
        bl      malloc
        str     r0, [sp, #4]            //        C1
        cmp     r0, #0                  //        C2
        beq     .L2
        ldr     r6, [sp, #8]
        movs    r5, #0
        movs    r4, #2
        b       .L3
.L8:
        add     r3, r7, r4, lsl #2      // A1
        str     r3, [sp, #12]           // A2
        ldr     r3, [r7, r4, lsl #2]    //    B1
        mov     r0, r3                  //    B2
        str     r3, [sp, #0]            //    B3
        bl      foo1
        ldr     r3, [sp, #0]            //    B4
        cbnz    r0, .L4
        mov     r0, r3                  //    B5
        bl      bar
        ldr     r3, [sp, #0]            //    B6
        cbz     r0, .L5
        ldr     r3, [sp, #12]           // A3
.L7:
        adds    r4, r4, #1
        cmp     r4, r6
        bge     .L6
        ldr     r0, [r3, #4]!           // A4
        str     r3, [sp, #0]            // A5
        bl      foo1
        ldr     r3, [sp, #0]            // A6
        cmp     r0, #0
        bne     .L7
.L6:
        ldr     r3, [r7, r4, lsl #2]    //    B7
        mov     r6, r4
.L5:
        ldr     r0, [r3, #0]            //    B8
        ldr     r1, [sp, #8]
        bl      foo
        ldr     r3, [sp, #4]            //       C3
        str     r0, [r3, r5, lsl #2]    //       C4
        adds    r5, r5, #1
.L4:
        adds    r4, r4, #1
.L3:
        cmp     r4, r6
        blt     .L8
.L2:
        ldr     r0, [sp, #4]            //       C5
        add     sp, sp, #20
        pop     {r4, r5, r6, r7, pc}

Usually instructions involved high registers are larger than those with low
registers only. But if we can use them cleverly we can improve both code size
and performance. Especially in cases we can reduce register spills and for
instruction encodings that are already 4 bytes long regardless what registers
are used.

First example is instructions marked An. They represent the live range of
register r3. Among them A2,A3,A5,A6 are register spill and reload instructions.
If we replace r3 with r8, these 4 instructions can be removed. At the same time
we keep the size of A1,A4 because they are already 4 bytes long. The cost we
must pay is save and restore high registers in function prologue and epilogue.
But it is still a win.

Now consider another live range of register r3 in instructions marked Bn. If we
replace r3 with r9, instructions B3,B4,B6 can be removed because they are spill
and reload. Code size of instruction B8 will be larger. The size of other Bn
instructions is unchanged. So it is also win.

The third example in this test case is the value in memory [sp, #4] in
instructions marked Cn. Instead we can place the value into register r10. So
instruction C1,C2 can be rewritten as "mov r10,r0". The reload in C3 can be
removed and the r3 in C4 can be replace by r10. C5 can be replaced by "mov r0,
r10".


-- 
           Summary: Use high registers to reduce code size and improve
                    performance when targeting thumb2
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: carrot at google dot com
 GCC build triplet: i686-linux
  GCC host triplet: i686-linux
GCC target triplet: arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43216

Reply via email to