https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110
Bug ID: 66110 Summary: uint8_t memory access not optimized Product: gcc Version: 5.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kevin at koconnor dot net Target Milestone: --- It appears that gcc does not do a good job of optimizing memory accesses to 'uint8_t' variables. In particular, it appears as if "strict aliasing" is not done on uint8_t variables, and it appears it is not done even if the uint8_t is in a struct. ============ GCC version: $ ~/src/install-5.1.0/bin/gcc -v Using built-in specs. COLLECT_GCC=/home/kevin/src/install-5.1.0/bin/gcc COLLECT_LTO_WRAPPER=/home/kevin/src/install-5.1.0/libexec/gcc/x86_64-unknown-linux-gnu/5.1.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-5.1.0/configure --prefix=/home/kevin/src/install-5.1.0 --enable-languages=c Thread model: posix gcc version 5.1.0 (GCC) =========== Compile command line: ~/src/install-5.1.0/bin/gcc -v -save-temps -O2 -Wall u8alias.c -c =========== Contents of u8alias.c: typedef __UINT8_TYPE__ uint8_t; typedef __UINT16_TYPE__ uint16_t; struct s1 { uint16_t f1; uint8_t f2; }; struct s2 { struct s1 *p1; }; void f1(struct s2 *p) { p->p1->f2 = 9; p->p1->f2 = 10; } =========== Contents of u8alias.i: # 1 "u8alias.c" # 1 "<built-in>" # 1 "<command-line>" # 1 "/usr/include/stdc-predef.h" 1 3 4 # 1 "<command-line>" 2 # 1 "u8alias.c" typedef unsigned char uint8_t; typedef short unsigned int uint16_t; struct s1 { uint16_t f1; uint8_t f2; }; struct s2 { struct s1 *p1; }; void f1(struct s2 *p) { p->p1->f2 = 9; p->p1->f2 = 10; } =========== Results of compilation: $ objdump -ldr u8alias.o 0000000000000000 <f1>: f1(): 0: 48 8b 07 mov (%rdi),%rax 3: c6 40 02 09 movb $0x9,0x2(%rax) 7: 48 8b 07 mov (%rdi),%rax a: c6 40 02 0a movb $0xa,0x2(%rax) e: c3 retq =========== Expected results: I expected the second redundant load to be eliminated - for example, clang produces this assembler (after replacing the gcc specific uint8_t typedefs with an include of <stdint.h>): $ clang -Wall -O2 u8alias.c -c $ objdump -ldr u8alias.o 0000000000000000 <f1>: f1(): 0: 48 8b 07 mov (%rdi),%rax 3: c6 40 02 0a movb $0xa,0x2(%rax) 7: c3 retq =========== Other notes: If the code is changed so that there are two redundant writes to ->f1 then gcc does properly optimize away the first store. Also, if the above definition of f2 is changed to an 8-bit bitfield (ie, "uint8_t f2 : 8;") then gcc does properly optimize away the first store. This occurs on other platforms besides x86_64. (In particular, it happens on avr-gcc where 8-bit integers are very common.) I reproduced the above on gcc 5.1.0, but I've also seen it on variants of gcc 4.8 and gcc 4.9. My guess is that the above is the result of an interpretation of the C99 specification - in particular section 6.5: An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [...] -- a character type. However, I do not think that should apply to the above test case for either of the two following reasons: 1 - the memory access was not to a character type, it was to an integer type that happened to be 1 byte in size (ie, a uint8_t type) 2 - the memory access was not to a character type, it was to a member of 'struct s1'. As noted above, clang (eg, 3.4.2) does perform the expected optimization.