https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85721
--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> --- #include <algorithm> void copy2(char* out, const char* in, SIZE_T n) { std::copy(in, in+n, out); } At -O3: copy2(char*, char const*, unsigned long): test rdx, rdx jne .L21 ret .L21: jmp memmove #include <memory> void copy3(char* out, const char* in, SIZE_T n) { std::uninitialized_copy(in, in+n, out); } At -O3: copy3(char*, char const*, unsigned long): test rdx, rdx jne .L21 ret .L21: jmp memmove Even with -O1 these beat your loop hands down: copy2(char*, char const*, unsigned long): test rdx, rdx jne .L11 ret .L11: sub rsp, 8 call memmove add rsp, 8 ret copy3(char*, char const*, unsigned long): test rdx, rdx jne .L18 ret .L18: sub rsp, 8 call memmove add rsp, 8 ret