The testcase below: typedef unsigned char uint8_t;
#define UART3_LSR (*(volatile uint8_t *)(0x49020000+20)) #define UART3_RBR (*(volatile uint8_t *)(0x49020000+0)) int IsSerialBufferFull(void) { return (UART3_LSR & 0x20) == 0; } void SendSerialByte(uint8_t byte) { while (IsSerialBufferFull()) ; UART3_RBR = byte; } when compiled with arm-none-eabi-gcc -O2 results in suboptimal code for SendSerialByte: SendSerialByte: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, #1224736768 add r3, r3, #131072 .L3: ldrb r2, [r3, #20] @ zero_extendqisi2 tst r2, #32 mov r2, #1224736768 add r2, r2, #131072 beq .L3 strb r0, [r2, #0] bx lr .size SendSerialByte, .-SendSerialByte The load of the constant for UART3_RBR (mov r2/add r2) should not be moved into the loop, since it's not needed until after the loop. Furthermore, the necessary value is already available in r3. The same problem doesn't happen on other platforms, e.g. mips-sde-elf gives: andi $4,$4,0x00ff li $2,1224867840 # 0x49020000 .L6: lbu $3,20($2) andi $3,$3,0x20 beq $3,$0,.L6 nop sb $4,0($2) j $31 nop or powerpc-eabi: lis 9,0x4902 ori 9,9,20 .L4: lbz 0,0(9) andi. 11,0,32 beq+ 0,.L4 lis 9,0x4902 stb 3,0(9) blr (which is not quite perfect, but better than the arm-none-eabi code...). -- Summary: constant address loads moved into loop unnecessarily Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: froydnj at gcc dot gnu dot org GCC host triplet: i686-pc-linux-gnu GCC target triplet: arm-none-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40672