On 11/21/18 11:16 AM, Linus Torvalds wrote:
> On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds
> <torva...@linux-foundation.org> wrote:
>>
>> It would be interesting to know exactly which copy it is that matters
>> so much...  *inlining* the erms case might show that nicely in
>> profiles.
> 
> Side note: the fact that Jens' patch (which I don't like in that form)
> allegedly shrunk the resulting kernel binary would seem to indicate
> that there's a *lot* of compile-time constant-sized memcpy calls that
> we are missing, and that fall back to copy_user_generic().

Other kind of side note... This also affects memset(), which does
rep stosb if we have ERMS if any size memset. I noticed this from
sg_init_table(), which does a memset of the table. For my kind of
testing, the entry size is small. The below, too, reduces memset()
overhead by 50% here for me.

diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S
index 9bc861c71e75..bad0fdb9ddcd 100644
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -60,6 +60,8 @@ EXPORT_SYMBOL(__memset)
  * rax   original destination
  */
 ENTRY(memset_erms)
+       cmpl $128,%edx
+       jb memset_orig
        movq %rdi,%r9
        movb %sil,%al
        movq %rdx,%rcx

-- 
Jens Axboe

Reply via email to