https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90285

            Bug ID: 90285
           Summary: Poor optimised codegen for memmove() back on top of
                    oneself
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: s_gccbugzilla at nedprod dot com
  Target Milestone: ---

The following code produces poor optimised codegen on trunk GCC at the time of
writing (2019-04-29):

// Reinterprets a T into its array of bytes
// Currently defined behaviour in C++ 20 for
// trivially copyable types only. The proposal
// would be that this would become defined 
// behaviour for most possible C++ types.
template<class T>
constexpr inline byte_array_ref<T> detach_cast(T &v) noexcept
{
    byte_array_ref<T> ret = reinterpret_cast<byte_array_ref<T>>(v);
    byte temp[sizeof(T)];

    // Reinterpret bytes by copying (not UB for TC types)
    memmove(temp, &v, sizeof(T));

    // Put reinterpreted bytes back. This avoids the UB
    // of reinterpret casting without creating new objects.
    memmove(ret, temp, sizeof(T));
    return ret;
}

You can see GCC's codegen here (it does two copies of 40Kb):
https://godbolt.org/z/sJWSc1

You can see clang's codegen here (which is optimal, nothing is copied):
https://godbolt.org/z/ou8VFT

I think GCC ought to not perform memory copies for the above code sequence.

Niall

Reply via email to