https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118550
Bug ID: 118550
Summary: Missed optimization for fusing two byte loads with
offsets
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: arseny.kapoulkine at gmail dot com
Target Milestone: ---
When presented with the following code:
uint16_t readle(const unsigned char* data, TYPE offset)
{
uint8_t b0 = data[offset], b1 = data[offset + 1];
return b0 | (b1 << 8);
}
gcc always generates inefficient code when targeting x64 that loads two bytes
separately regardless of the type of offset (int, size_t, ptrdiff_t).
For example, with int offset, gcc trunk generates:
movsx rsi, esi
movzx eax, BYTE PTR [rdi+1+rsi]
movzx edx, BYTE PTR [rdi+rsi]
sal eax, 8
or eax, edx
clang generates efficient code that just has a single 2-byte load for all types
of offset except for "unsigned int" where it needs to handle overflow. This
includes size_t (where overflow is well defined, but presumably offset can
never be SIZE_MAX because that would result in a pointer overflow?). For int
offset, clang generates:
movsxd rax, esi
movzx eax, word ptr [rdi + rax]
See https://gcc.godbolt.org/z/6fcnedqPM for a full comparison of different
types