https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91674
Bug ID: 91674 Summary: [ARM/thumb] redundant memcpy does not get optimized away on thumb Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: andij.cr at gmail dot com Target Milestone: --- consider this c++ function #include <cstring> #include <array> #include <cstdint> auto to_bytes(uint32_t arg){ std::array<uint8_t, sizeof(arg)> out{}; std::memcpy(out.data(), &arg, sizeof(arg)); return out; } on a little endian arch this function could be no-op. compiled with g++ -Os we get: to_bytes(unsigned int): mov eax, edi ret on arm this somewhat works: compiled with arm-none-eabi-g++ -Os to_bytes(unsigned int): sub sp, sp, #8 add sp, sp, #8 bx lr notice the redundant sub followed by an add but if if thumb is forced, the full optimization is not performed compiled with arm-none-eabi-g++ -Os -march=armv7-m -mtune=cortex-m3 to_bytes(unsigned int): mov r3, r0 movs r0, #0 uxtb r2, r3 bfi r0, r2, #0, #8 ubfx r2, r3, #8, #8 bfi r0, r2, #8, #8 ubfx r2, r3, #16, #8 bfi r0, r2, #16, #8 lsrs r3, r3, #24 sub sp, sp, #8 bfi r0, r3, #24, #8 add sp, sp, #8 bx lr in contrast, cross compiling with clang7 produces the desired optimization: compiled with clang++7 --target=arm-none-eabi -march=armv7-m -mtune=cortex-m3 to_bytes(unsigned int): bx lr notice also how there is no redundant stack pointer manipulation