https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006
Bug ID: 107006 Summary: Missing optimization: common idiom for external data Product: gcc Version: 12.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- Created attachment 53602 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53602&action=edit C test case source The only *portable* way in C to deal with external data structures containing data of specific endianness, possibly unaligned, is to operate on them as byte (char) arrays. At least on x86 (which supports arbitrarily aligned loads), gcc *sometimes* recognize these as single loads, but sometimes not. In the included test cases, there is a plain C implementation and an implementation wrapped in a C++ class. Compiling the former with: gcc -std=c2x -g -O3 -W -Wall -[cSE] -o bswap.[osi] bswap.c ... recognizes the load idiom for 16-bit numbers but not for 32- or 64-bit numbers. Compiling the latter with: gcc -std=c++20 -g -O3 -E -Wall -[cSE] -o bswapcc.[osi] bswapcc.cc ... *additionally* recognizes the 32-bit load, *but only in the bigendian case* (that is, it generates a load and a bswap instruction); whereas in the littleendian -- native -- case, this does not happen! I am familiar with the used of packed arrays and __builtin_bswap*() for these accesses, but unfortunately these are gcc-specific.