Hi,

I am comparing the assembly generated by compilers targeting arm-wince
platform and it seems
that cross-compiler from gcc-trunk is less optimized than an old one based
on gcc 4.1.x
Here is the comparison obtained from objdump:


cegcc-4.1.x : 

00011000 <WinMainCRTStartup>:
   11000:       e92d40f0        push    {r4, r5, r6, r7, lr}
   11004:       e1a04000        mov     r4, r0
   11008:       e1a05001        mov     r5, r1
   1100c:       e1a06002        mov     r6, r2
   11010:       e1a07003        mov     r7, r3
   11014:       eb0000de        bl      11394 <_fpreset>
   11018:       eb00002a        bl      110c8 <_pei386_runtime_relocator>
   1101c:       eb000099        bl      11288 <__atexit_init>
   11020:       eb0000d3        bl      11374 <__gccmain>
   11024:       e1a01005        mov     r1, r5
   11028:       e1a00004        mov     r0, r4
   1102c:       e1a02006        mov     r2, r6
   11030:       e1a03007        mov     r3, r7
   11034:       eb000005        bl      11050 <WinMain>
   11038:       e1a04000        mov     r4, r0
   1103c:       eb000087        bl      11260 <_cexit>
   11040:       e1a01004        mov     r1, r4
   11044:       e3a00042        mov     r0, #66 ; 0x42
   11048:       eb0000d4        bl      113a0 <TerminateProcess>
   1104c:       eafffffe        b       1104c <WinMainCRTStartup+0x4c>

cegcc-4.4.x

00011000 <WinMainCRTStartup>:
   11000:       e92d4010        push    {r4, lr}
   11004:       e1a04000        mov     r4, r0
   11008:       e24dd00c        sub     sp, sp, #12     ; 0xc
   1100c:       e58d1008        str     r1, [sp, #8]
   11010:       e58d2004        str     r2, [sp, #4]
   11014:       e58d3000        str     r3, [sp]
   11018:       eb000120        bl      114a0 <_fpreset>
   1101c:       eb000043        bl      11130 <_pei386_runtime_relocator>
   11020:       eb0000ce        bl      11360 <__atexit_init>
   11024:       eb000111        bl      11470 <__gccmain>
   11028:       e59d1008        ldr     r1, [sp, #8]
   1102c:       e1a00004        mov     r0, r4
   11030:       e59d2004        ldr     r2, [sp, #4]
   11034:       e59d3000        ldr     r3, [sp]
   11038:       eb000028        bl      110e0 <WinMain>
   1103c:       e1a04000        mov     r4, r0
   11040:       eb0000ba        bl      11330 <_cexit>
   11044:       e1a01004        mov     r1, r4
   11048:       e3a00042        mov     r0, #66 ; 0x42
   1104c:       eb000116        bl      114ac <TerminateProcess>
   11050:       eafffffe        b       11050 <WinMainCRTStartup+0x50>
   11054:       e1a00000        nop                     (mov r0,r0)
   11058:       e1a00000        nop                     (mov r0,r0)
   1105c:       e1a00000        nop                     (mov r0,r0)

If you have a look at address 11008-1100c you can see that old gcc is using
registers 
but upcoming gcc-4.4 is using memory.

I tried to put some optim flags -O2 but it doesn't modify the situation.
Is there anything to do to improve this situation ? Is it a normal behavior
?
Maybe my remark is not relevant because I didn't try to do some benchmark
and I agree
this is not because gcc-trunk do not optimize this specific part that it
will be slower.
I have also noticed that now I get some nop instructions and when I ask gcc
to generate
assembly I can see that alignement directive is different.
I used to have .align 0 with gcc-4.1 and now I get a .align 4, how can I
change that ?
And finally maybe those nop insn prevents compiler from optimizing ...



Thanks







Reply via email to