https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103345
Bug ID: 103345 Summary: missed optimization: add/xor individual bytes to form a word Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: gcc at rjk dot terraraq.uk Target Milestone: --- All code generated with godbolt's idea of 'trunk'. See https://godbolt.org/z/Wcj61PKKG Source: #include <stdint.h> uint32_t load_le_32_or(const uint8_t *ptr) { return ((uint32_t)ptr[0]) | ((uint32_t)ptr[1] << 8) | ((uint32_t)ptr[2] << 16) | ((uint32_t)ptr[3] << 24); } uint32_t load_le_32_add(const uint8_t *ptr) { return ((uint32_t)ptr[0]) + ((uint32_t)ptr[1] << 8) + ((uint32_t)ptr[2] << 16) + ((uint32_t)ptr[3] << 24); } uint32_t load_le_32_xor(const uint8_t *ptr) { return ((uint32_t)ptr[0]) ^ ((uint32_t)ptr[1] << 8) ^ ((uint32_t)ptr[2] << 16) ^ ((uint32_t)ptr[3] << 24); } The ^ version is admittedly a bit of an odd choice but the + version is a reasonably natural way to write the code. Code on gcc -O2: load_le_32_or: mov eax, DWORD PTR [rdi] ret load_le_32_add: movzx eax, BYTE PTR [rdi+1] movzx edx, BYTE PTR [rdi+2] sal eax, 8 sal edx, 16 add eax, edx movzx edx, BYTE PTR [rdi] add eax, edx movzx edx, BYTE PTR [rdi+3] sal edx, 24 add eax, edx ret load_le_32_xor: movzx eax, BYTE PTR [rdi+1] movzx edx, BYTE PTR [rdi+2] sal eax, 8 sal edx, 16 xor eax, edx movzx edx, BYTE PTR [rdi] xor eax, edx movzx edx, BYTE PTR [rdi+3] sal edx, 24 xor eax, edx ret Code on clang -O2: load_le_32_or: # @load_le_32_or mov eax, dword ptr [rdi] ret load_le_32_add: # @load_le_32_add mov eax, dword ptr [rdi] ret load_le_32_xor: # @load_le_32_xor mov eax, dword ptr [rdi] ret