https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91674

            Bug ID: 91674
           Summary: [ARM/thumb] redundant memcpy does not get optimized
                    away on thumb
           Product: gcc
           Version: 8.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andij.cr at gmail dot com
  Target Milestone: ---

consider this c++ function

#include <cstring>
#include <array>
#include <cstdint>
auto to_bytes(uint32_t arg){
    std::array<uint8_t, sizeof(arg)> out{};
    std::memcpy(out.data(), &arg, sizeof(arg));
    return out;
}

on a little endian arch this function could be no-op. 
compiled with g++ -Os we get:
to_bytes(unsigned int):
        mov     eax, edi
        ret 

on arm this somewhat works:
compiled with arm-none-eabi-g++ -Os
to_bytes(unsigned int):
        sub     sp, sp, #8
        add     sp, sp, #8
        bx      lr

notice the redundant sub followed by an add

but if if thumb is forced, the full optimization is not performed
compiled with arm-none-eabi-g++ -Os -march=armv7-m -mtune=cortex-m3
to_bytes(unsigned int):
        mov     r3, r0
        movs    r0, #0
        uxtb    r2, r3
        bfi     r0, r2, #0, #8
        ubfx    r2, r3, #8, #8
        bfi     r0, r2, #8, #8
        ubfx    r2, r3, #16, #8
        bfi     r0, r2, #16, #8
        lsrs    r3, r3, #24
        sub     sp, sp, #8
        bfi     r0, r3, #24, #8
        add     sp, sp, #8
        bx      lr

in contrast, cross compiling with clang7 produces the desired optimization:
compiled with clang++7 --target=arm-none-eabi -march=armv7-m -mtune=cortex-m3
to_bytes(unsigned int):
        bx      lr

notice also how there is no redundant stack pointer manipulation

Reply via email to