Given the following input:

char greater_than_64(long long x, long long y) {
        return x > y;
}
char greater_than_32(long x, long y) {
        return x > y;
}

And running "avr-gcc -Os -c lol.c -o lol.o", I get the following output

00000064 <greater_than_32>:
  64:   ef 92           push    r14
  66:   ff 92           push    r15
  68:   0f 93           push    r16
  6a:   1f 93           push    r17
  6c:   e2 2e           mov     r14, r18
  6e:   f3 2e           mov     r15, r19
  70:   04 2f           mov     r16, r20
  72:   15 2f           mov     r17, r21
  74:   31 e0           ldi     r19, 0x01       ; 1
  76:   e6 16           cp      r14, r22
  78:   f7 06           cpc     r15, r23
  7a:   08 07           cpc     r16, r24
  7c:   19 07           cpc     r17, r25
  7e:   04 f0           brlt    .+0             ; 0x80 <greater_than_32+0x1c>
  80:   30 e0           ldi     r19, 0x00       ; 0
  82:   83 2f           mov     r24, r19
  84:   1f 91           pop     r17
  86:   0f 91           pop     r16
  88:   ff 90           pop     r15
  8a:   ef 90           pop     r14
  8c:   08 95           ret

This is much longer than necessary; the input values could be left in their
original registers, rather than moved before comparison. I'm not sure why GCC
does this. The 8-byte version is even worse:

00000000 <greater_than_64>:
   0:   af 92           push    r10
   2:   bf 92           push    r11
   4:   cf 92           push    r12
   6:   df 92           push    r13
   8:   ef 92           push    r14
   a:   ff 92           push    r15
   c:   0f 93           push    r16
   e:   1f 93           push    r17
  10:   e1 e0           ldi     r30, 0x01       ; 1
  12:   19 17           cp      r17, r25
  14:   04 f0           brlt    .+0             ; 0x16 <greater_than_64+0x16>
  16:   91 17           cp      r25, r17
  18:   01 f4           brne    .+0             ; 0x1a <greater_than_64+0x1a>
  1a:   08 17           cp      r16, r24
  1c:   00 f0           brcs    .+0             ; 0x1e <greater_than_64+0x1e>
  1e:   80 17           cp      r24, r16
  20:   01 f4           brne    .+0             ; 0x22 <greater_than_64+0x22>
  22:   f7 16           cp      r15, r23
  24:   00 f0           brcs    .+0             ; 0x26 <greater_than_64+0x26>
  26:   7f 15           cp      r23, r15
  28:   01 f4           brne    .+0             ; 0x2a <greater_than_64+0x2a>
  2a:   e6 16           cp      r14, r22
  2c:   00 f0           brcs    .+0             ; 0x2e <greater_than_64+0x2e>
  2e:   6e 15           cp      r22, r14
  30:   01 f4           brne    .+0             ; 0x32 <greater_than_64+0x32>
  32:   d5 16           cp      r13, r21
  34:   00 f0           brcs    .+0             ; 0x36 <greater_than_64+0x36>
  36:   5d 15           cp      r21, r13
  38:   01 f4           brne    .+0             ; 0x3a <greater_than_64+0x3a>
  3a:   c4 16           cp      r12, r20
  3c:   00 f0           brcs    .+0             ; 0x3e <greater_than_64+0x3e>
  3e:   4c 15           cp      r20, r12
  40:   01 f4           brne    .+0             ; 0x42 <greater_than_64+0x42>
  42:   b3 16           cp      r11, r19
  44:   00 f0           brcs    .+0             ; 0x46 <greater_than_64+0x46>
  46:   3b 15           cp      r19, r11
  48:   01 f4           brne    .+0             ; 0x4a <greater_than_64+0x4a>
  4a:   a2 16           cp      r10, r18
  4c:   00 f0           brcs    .+0             ; 0x4e <greater_than_64+0x4e>
  4e:   e0 e0           ldi     r30, 0x00       ; 0
  50:   8e 2f           mov     r24, r30
  52:   1f 91           pop     r17
  54:   0f 91           pop     r16
  56:   ff 90           pop     r15
  58:   ef 90           pop     r14
  5a:   df 90           pop     r13
  5c:   cf 90           pop     r12
  5e:   bf 90           pop     r11
  60:   af 90           pop     r10
  62:   08 95           ret

Here there are many redundant jumps as well as pushes and pops. It seems like
these could be trivially eliminated with a peephole pass.

I'm running a build straight out of trunk as of today:
Using built-in specs.
COLLECT_GCC=avr-gcc
COLLECT_LTO_WRAPPER=/usr/local/avr/libexec/gcc/avr/4.5.0/lto-wrapper
Target: avr
Configured with: ../configure --prefix=/usr/local/avr --target=avr
--enable-languages=c --disable-nls --disable-libssp --with-dwarf2 :
(reconfigured) ../configure --prefix=/usr/local/avr --target=avr
--enable-languages=c --disable-nls --disable-libssp --with-dwarf2
Thread model: single
gcc version 4.5.0 20100216 (experimental) (GCC)


-- 
           Summary: [avr] GCC produces poor code for 4- and 8-byte values
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jdpotter at andrew dot cmu dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43095

Reply via email to