> On Nov 23, 2018, at 10:42 AM, Linus Torvalds <torva...@linux-foundation.org> 
> wrote:
> 
> On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds
> <torva...@linux-foundation.org> wrote:
>> 
>> Let me write a generic routine in lib/iomap_copy.c (which already does
>> the "user specifies chunk size" cases), and hook it up for x86.
> 
> Something like this?
> 
> ENTIRELY UNTESTED! It might not compile. Seriously. And if it does
> compile, it might not work.
> 
> And this doesn't actually do the memset_io() function at all, just the
> memcpy ones.
> 
> Finally, it's worth noting that on x86, we have this:
> 
>  /*
>   * override generic version in lib/iomap_copy.c
>   */
>  ENTRY(__iowrite32_copy)
>          movl %edx,%ecx
>          rep movsd
>          ret
>  ENDPROC(__iowrite32_copy)
> 
> because back in 2006, we did this:
> 
>    [PATCH] Add faster __iowrite32_copy routine for x86_64
> 
>    This assembly version is measurably faster than the generic version in
>    lib/iomap_copy.c.
> 
> which actually implies that "rep movsd" is faster than doing
> __raw_writel() by hand.
> 
> So it is possible that this should all be arch-specific code rather
> than that butt-ugly "generic" code I wrote in this patch.
> 
> End result: I'm not really all that  happy about this patch, but it's
> perhaps worth testing, and it's definitely worth discussing. Because
> our current memcpy_{to,from}io() is truly broken garbage.
> 
>                   

What is memcpy_to_io even supposed to do?  I’m guessing it’s defined as 
something like “copy this data to IO space using at most long-sized writes, all 
aligned, and writing each byte exactly once, in order.”  That sounds... 
dubiously useful.  I could see a function that writes to aligned memory in 
specified-sized chunks.  And I can see a use for a function to just write it in 
whatever size chunks the architecture thinks is fastest, and *that* should 
probably use MOVDIR64B.

Or is there some subtlety I’m missing?

Reply via email to