Hi, I am comparing the assembly generated by compilers targeting arm-wince platform and it seems that cross-compiler from gcc-trunk is less optimized than an old one based on gcc 4.1.x Here is the comparison obtained from objdump:
cegcc-4.1.x : 00011000 <WinMainCRTStartup>: 11000: e92d40f0 push {r4, r5, r6, r7, lr} 11004: e1a04000 mov r4, r0 11008: e1a05001 mov r5, r1 1100c: e1a06002 mov r6, r2 11010: e1a07003 mov r7, r3 11014: eb0000de bl 11394 <_fpreset> 11018: eb00002a bl 110c8 <_pei386_runtime_relocator> 1101c: eb000099 bl 11288 <__atexit_init> 11020: eb0000d3 bl 11374 <__gccmain> 11024: e1a01005 mov r1, r5 11028: e1a00004 mov r0, r4 1102c: e1a02006 mov r2, r6 11030: e1a03007 mov r3, r7 11034: eb000005 bl 11050 <WinMain> 11038: e1a04000 mov r4, r0 1103c: eb000087 bl 11260 <_cexit> 11040: e1a01004 mov r1, r4 11044: e3a00042 mov r0, #66 ; 0x42 11048: eb0000d4 bl 113a0 <TerminateProcess> 1104c: eafffffe b 1104c <WinMainCRTStartup+0x4c> cegcc-4.4.x 00011000 <WinMainCRTStartup>: 11000: e92d4010 push {r4, lr} 11004: e1a04000 mov r4, r0 11008: e24dd00c sub sp, sp, #12 ; 0xc 1100c: e58d1008 str r1, [sp, #8] 11010: e58d2004 str r2, [sp, #4] 11014: e58d3000 str r3, [sp] 11018: eb000120 bl 114a0 <_fpreset> 1101c: eb000043 bl 11130 <_pei386_runtime_relocator> 11020: eb0000ce bl 11360 <__atexit_init> 11024: eb000111 bl 11470 <__gccmain> 11028: e59d1008 ldr r1, [sp, #8] 1102c: e1a00004 mov r0, r4 11030: e59d2004 ldr r2, [sp, #4] 11034: e59d3000 ldr r3, [sp] 11038: eb000028 bl 110e0 <WinMain> 1103c: e1a04000 mov r4, r0 11040: eb0000ba bl 11330 <_cexit> 11044: e1a01004 mov r1, r4 11048: e3a00042 mov r0, #66 ; 0x42 1104c: eb000116 bl 114ac <TerminateProcess> 11050: eafffffe b 11050 <WinMainCRTStartup+0x50> 11054: e1a00000 nop (mov r0,r0) 11058: e1a00000 nop (mov r0,r0) 1105c: e1a00000 nop (mov r0,r0) If you have a look at address 11008-1100c you can see that old gcc is using registers but upcoming gcc-4.4 is using memory. I tried to put some optim flags -O2 but it doesn't modify the situation. Is there anything to do to improve this situation ? Is it a normal behavior ? Maybe my remark is not relevant because I didn't try to do some benchmark and I agree this is not because gcc-trunk do not optimize this specific part that it will be slower. I have also noticed that now I get some nop instructions and when I ask gcc to generate assembly I can see that alignement directive is different. I used to have .align 0 with gcc-4.1 and now I get a .align 4, how can I change that ? And finally maybe those nop insn prevents compiler from optimizing ... Thanks