It seems that GCC 4.2.1 generates better code than GCC 4.4.0 in this case:

The following code (extracted from Android's
Dalvik_java_lang_System_currentTimeMillis in native/java_lang_System.c):

// compilation options: -march=armv5te -mthumb -Os

struct timeval
{
 long tv_sec;
 long tv_usec;
};

extern void get_time(struct timeval*);

void test(long long *res)
{
 struct timeval tv;
 get_time(&tv);
 *res = tv.tv_sec * 1000LL + tv.tv_usec / 1000;
}

is compiled by gcc-4.4.0 in sub-optimal way, so it takes 110 bytes (vs 74 bytes
when compiled by gcc-4.2.1). Assembly files shows that it spills some registers
on stack because code that multiply on 1000LL uses more registers that it need
(that is use when compiled by gcc-4.2.1). Multiplication code is similar, but
gcc 4.4 emits several additional MOVs that can be easily eliminated.

This bug can be more easily demonstrated with multiplication of tv_sec by 10
and tv_usec/ 1000 removed.

gcc.4.2.1:
      push    {r4, r5, lr}
       sub     sp, sp, #12
       mov     r5, r0
       mov     r0, sp
       bl      get_time
       ldr     r2, [sp]
       add     sp, sp, #12
       @ sp needed for prologue
       asr     r4, r2, #31
       mov     r3, r2
       lsr     r0, r2, #30
       lsl     r2, r4, #2
       orr     r2, r2, r0
       lsl     r1, r3, #2
       add     r1, r1, r3
       adc     r2, r2, r4
       lsr     r0, r1, #31
       lsl     r4, r2, #1
       orr     r4, r4, r0
       lsl     r3, r1, #1
       str     r3, [r5]
       str     r4, [r5, #4]
       pop     {r4, r5, pc}

gcc 4.4.0:
       push    {r4, r5, r6, r7, lr}          // note that gcc 4.2.1 uses only
{r4, r5, lr}
       sub     sp, sp, #12
       mov     r4, r0
       mov     r0, sp
       bl      get_time
       ldr     r6, [sp]
       add     sp, sp, #12
       @ sp needed for prologue
       mov     r0, r6
       asr     r6, r6, #31
       lsr     r7, r0, #30
       lsl     r3, r6, #2
       orr     r3, r3, r7
       mov     r1, r6   // not needed actually, r6 can be used directly
       lsl     r2, r0, #2
       add     r0, r0, r2
       adc     r1, r1, r3
       lsr     r2, r0, #31
       lsl     r3, r1, #1
       orr     r3, r3, r2
       lsl     r0, r0, #1
       str     r0, [r4]
       str     r3, [r4, #4]
       pop     {r4, r5, r6, r7, pc}


-- 
           Summary: Bad register allocation in multiplication code by
                    constant
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sliao at google dot com
 GCC build triplet: i686-linux
  GCC host triplet: i686-linux
GCC target triplet: arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499

Reply via email to