https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77610
--- Comment #2 from Rich Felker <bugdal at aerifal dot cx> --- Unless you expect the inline memcpy to be a size savings (and it does not seem to be), the size threshold can just be chosen such that function call time is negligible compared to copying time. I suspect that's already true around 256 bytes or so. I'm testing a patch where I used 256 as the limit and it made the Linux kernel very slightly faster (~1-2%) and does not seem to hurt anywhere. Major differences are unlikely to be seen unless the library memcpy does something fancy like DMA (or just avoiding aliasing in direct-mapped caches).