https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32605
Jed Brown <jed at 59A2 dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jed at 59A2 dot org --- Comment #5 from Jed Brown <jed at 59A2 dot org> --- The missed optimization even exists for code such as this, which should compile to a simple load on LE architectures. unsigned read_u32_le(const unsigned char arr[]) { return (arr[0] << 0) | (arr[1] << 8) | (arr[2] << 16) | (arr[3] << 24); } gcc-8.3/trunk -O: read_u32_le: movzx eax, BYTE PTR [rdi+1] sal eax, 8 movzx edx, BYTE PTR [rdi+2] sal edx, 16 or eax, edx movzx edx, BYTE PTR [rdi] or eax, edx movzx edx, BYTE PTR [rdi+3] sal edx, 24 or eax, edx ret clang-8 -O: read_u32_le: # @read_u32_le mov eax, dword ptr [rdi] ret https://gcc.godbolt.org/z/8lGeCF