https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94834
Bug ID: 94834 Summary: Failure to optimize loop bswap pattern Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- uint32_t load(const uint8_t* data) { uint32_t val = 0; for (int i = 0; i < sizeof(val) * CHAR_BIT; i += CHAR_BIT) { val |= *data++ << i; } return val; } This can be optimized to a single 32-bit load. LLVM does this transformation, gcc just unrolls the loop and misses the transformation. LLVM gives : load(unsigned char const*): # @load(unsigned char const*) mov eax, dword ptr [rdi] ret GCC gives : load(unsigned char const*): movzx edx, BYTE PTR [rdi+1] movzx eax, BYTE PTR [rdi] sal edx, 8 or edx, eax movzx eax, BYTE PTR [rdi+2] sal eax, 16 or edx, eax movzx eax, BYTE PTR [rdi+3] sal eax, 24 or eax, edx ret See also https://godbolt.org/z/kmYTLZ