4-byte memcpy on strict-align targets

lasse.collin at tukaani dot org via Gcc-bugs Wed, 20 Sep 2023 11:37:41 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111502


--- Comment #2 from Lasse Collin <lasse.collin at tukaani dot org> ---
Byte access by default is good when the compiler doesn't know if unaligned is
fast on the target processor. There is no disagreement here.

What I suspect is a bug is the instruction sequence used for byte access in
copy16 and copy32 cases. copy16 uses 2 * lbu + 2 * sb + 1 * lhu, that is, five
memory operations to load an unaligned 16-bit integer. copy32 uses 4 * lbu + 4
* sb + 1 * lw, that is, nine memory operations to load a 32-bit integer.

bytes16 needs two memory operations and bytes32 needs four. Clang generates
this kind of code from both bytesxx and copyxx.

[Bug target/111502] Suboptimal unaligned 2/4-byte memcpy on strict-align targets

Reply via email to