https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70055
--- Comment #10 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Wilco from comment #9) > (In reply to H.J. Lu from comment #8) > > Inlining mempcpy uses a callee-saved register: > > > ... > > > > Not inlining mempcpy is preferred. > > If codesize is the only thing that matters... The cost is not at the caller > side but in requiring a separate mempcpy function which causes extra I-cache > misses. The only case where mempcpy makes sense is if you can use a shared > implementation with zero overhead to memcpy. Some archs have gone extra effort to implement optimized mempcpy in glibc. There is no reason not to use it. Sharing instructions between memcpy and mempcpy belongs to a different discussion.