https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90285
Bug ID: 90285 Summary: Poor optimised codegen for memmove() back on top of oneself Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: s_gccbugzilla at nedprod dot com Target Milestone: --- The following code produces poor optimised codegen on trunk GCC at the time of writing (2019-04-29): // Reinterprets a T into its array of bytes // Currently defined behaviour in C++ 20 for // trivially copyable types only. The proposal // would be that this would become defined // behaviour for most possible C++ types. template<class T> constexpr inline byte_array_ref<T> detach_cast(T &v) noexcept { byte_array_ref<T> ret = reinterpret_cast<byte_array_ref<T>>(v); byte temp[sizeof(T)]; // Reinterpret bytes by copying (not UB for TC types) memmove(temp, &v, sizeof(T)); // Put reinterpreted bytes back. This avoids the UB // of reinterpret casting without creating new objects. memmove(ret, temp, sizeof(T)); return ret; } You can see GCC's codegen here (it does two copies of 40Kb): https://godbolt.org/z/sJWSc1 You can see clang's codegen here (which is optimal, nothing is copied): https://godbolt.org/z/ou8VFT I think GCC ought to not perform memory copies for the above code sequence. Niall