http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58322
Bug ID: 58322 Summary: similar simple code produces different (nd non-optimal) result Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: minor Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: michael at reinelt dot co.at avr-gcc on a very simple test case produces different assembler code for the same statement: #include <stdint.h> #include <avr/io.h> // if avr/io.h is not available: // #define _MMIO_BYTE(mem_addr) (*(volatile uint8_t *)(mem_addr)) // #define _SFR_MEM8(mem_addr) _MMIO_BYTE(mem_addr) // #define UCSR0B _SFR_MEM8(0xC1) char flag; void test1(void) { UCSR0B |= 1; } void test2(void) { if (flag) { UCSR0B |= 1; } } Result: test1: ldi r30,lo8(-63) ; tmp44, ldi r31,0 ; ld r24,Z ; D.1400, MEM[(volatile uint8_t *)193B] ori r24,lo8(1) ; D.1400, st Z,r24 ; MEM[(volatile uint8_t *)193B], D.1400 ret test2: lds r24,flag ; flag, flag tst r24 ; flag breq .L2 ; , lds r24,193 ; D.1397, MEM[(volatile uint8_t *)193B] ori r24,lo8(1) ; D.1397, sts 193,r24 ; MEM[(volatile uint8_t *)193B], D.1397 .L2: ret in test1, the simple bit-set in memory (which is a UART control register) is done by indirect addressing with Z-Register, while in the second case (inside the if() body) it is changed to direct load/store. The resulting binary size is the same in both cases (5 words), but the first code is slower (7 cycles instead of 5), uses more registers, and, last but not least, looks more complicated :-) I tried to play around with some rtl-dump options (I am not familiar with RTL), and found out that there is a change in pass 162 (cprop1) where the addressing in test2 changes from indirect to direct (resulting in lds/sts instead of ld,Z), while the code in test1 does not change.