https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67701
Bug ID: 67701 Summary: Unnecessary/bad instructions for u32-casted access to external symbol (assumes misaligned, superfluous load) Product: gcc Version: 5.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jwerner at chromium dot org Target Milestone: --- Consider the following sample program: static inline void write32(void *addr, unsigned int value) { *(volatile unsigned int *)addr = value; } extern unsigned char testvalue[]; void main() { write32(&testvalue, 4); } GCC 5.2 (with -O2 -fno-stack-protector -ffreestanding -fomit-frame-pointer) generates: main: mov r1, #4 mov r2, #0 ldr r3, .L2 ldrb r0, [r3] @ zero_extendqisi2 strb r1, [r3] ldrb r1, [r3, #1] @ zero_extendqisi2 strb r2, [r3, #1] ldrb r1, [r3, #2] @ zero_extendqisi2 strb r2, [r3, #2] ldrb r1, [r3, #3] @ zero_extendqisi2 strb r2, [r3, #3] bx lr Whereas GCC 4.9 (same arguments) generates: main: movw r3, #:lower16:testvalue movs r2, #4 movt r3, #:upper16:testvalue ldr r1, [r3] str r2, [r3] bx lr I think there are two problems with this: 1. Both GCC 5.2 and 4.9 generate superfluous load instructions to |addr|, even though the pointer is never dereferenced as an rvalue (only as the left-hand side of an assignment). This is a) unnecessary and b) dangerous since the pointer is declared volatile (meaning it could be an MMIO register with read side-effects). 2. GCC 5.2 seems to somehow assume it must treat the object as unaligned and generates byte-wise accesses. I don't understand why it would do that, since nothing here is declared __attribute__((packed)) and the explicit cast to a 4-byte-aligned type should remove any ambiguity for the compiler about which alignment it can assume. It is important for systems code to have a (portable) way to dereference an address with an explicit alignment regardless of where it came from (e.g. for MMIO registers that only support 32-bit wide accesses). Both of these issues can be solved by using __builtin_assume_aligned(addr, sizeof(unsigned long)), but I still think the compiler shouldn't have done this in the first place (and having a portable solution to this problem is preferable). Context: http://review.coreboot.org/#/c/11698