On 11/21/18 11:16 AM, Linus Torvalds wrote: > On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds > <torva...@linux-foundation.org> wrote: >> >> It would be interesting to know exactly which copy it is that matters >> so much... *inlining* the erms case might show that nicely in >> profiles. > > Side note: the fact that Jens' patch (which I don't like in that form) > allegedly shrunk the resulting kernel binary would seem to indicate > that there's a *lot* of compile-time constant-sized memcpy calls that > we are missing, and that fall back to copy_user_generic().
Other kind of side note... This also affects memset(), which does rep stosb if we have ERMS if any size memset. I noticed this from sg_init_table(), which does a memset of the table. For my kind of testing, the entry size is small. The below, too, reduces memset() overhead by 50% here for me. diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S index 9bc861c71e75..bad0fdb9ddcd 100644 --- a/arch/x86/lib/memset_64.S +++ b/arch/x86/lib/memset_64.S @@ -60,6 +60,8 @@ EXPORT_SYMBOL(__memset) * rax original destination */ ENTRY(memset_erms) + cmpl $128,%edx + jb memset_orig movq %rdi,%r9 movb %sil,%al movq %rdx,%rcx -- Jens Axboe