https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111502
--- Comment #2 from Lasse Collin <lasse.collin at tukaani dot org> --- Byte access by default is good when the compiler doesn't know if unaligned is fast on the target processor. There is no disagreement here. What I suspect is a bug is the instruction sequence used for byte access in copy16 and copy32 cases. copy16 uses 2 * lbu + 2 * sb + 1 * lhu, that is, five memory operations to load an unaligned 16-bit integer. copy32 uses 4 * lbu + 4 * sb + 1 * lw, that is, nine memory operations to load a 32-bit integer. bytes16 needs two memory operations and bytes32 needs four. Clang generates this kind of code from both bytesxx and copyxx.