https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94834

            Bug ID: 94834
           Summary: Failure to optimize loop bswap pattern
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

uint32_t load(const uint8_t* data)
{
    uint32_t val = 0;
    for (int i = 0; i < sizeof(val) * CHAR_BIT; i += CHAR_BIT)
    {
        val |= *data++ << i;
    }
    return val;
}

This can be optimized to a single 32-bit load. LLVM does this transformation,
gcc just unrolls the loop and misses the transformation.

LLVM gives :

load(unsigned char const*): # @load(unsigned char const*)
  mov eax, dword ptr [rdi]
  ret

GCC gives :

load(unsigned char const*):
  movzx edx, BYTE PTR [rdi+1]
  movzx eax, BYTE PTR [rdi]
  sal edx, 8
  or edx, eax
  movzx eax, BYTE PTR [rdi+2]
  sal eax, 16
  or edx, eax
  movzx eax, BYTE PTR [rdi+3]
  sal eax, 24
  or eax, edx
  ret

See also https://godbolt.org/z/kmYTLZ

Reply via email to