On Thu, Nov 22, 2018 at 10:07 AM Andy Lutomirski <l...@kernel.org> wrote: > > I'm not personally volunteering, but I suspect we can do much better > than we do now: > > - The new MOVDIRI and MOVDIR64B instructions can do big writes to WC > and UC memory. > > - MOVNTDQA can, I think, do 64-byte loads, but only from WC memory.
No, performance isn't the _primary_ issue. Nobody uses MMIO and expects high performance from the generic functions (but people may then tweak individual drivers to do tricks). And we've historically had various broken hardware that cares deeply about access size. Trying to be clever and do big accesses could easily break something. The fact that nobody has complained about the generic memcpy routines probably means that the broken hardware isn't in use any more, or it just works anyway. And nobody has complained about performance either, so it's clearly not a huge issue. "rep movs" probably works ok on WC memory writes anyway, it's the UC case that is bad, but I don't think anybody uses UC and then does the "memcp_to/fromio()" things. If you have UC memory, you tend to do the accesses properly. So I suspect we should just write memcpy_{to,from}io() in terms of writel/readl. Oh, and I just noticed that on x86 we expressly use our old "safe and sane" functions: see __inline_memcpy(), and its use in __memcpy_{from,to}io(). So the "falls back to memcpy" was always a red herring. We don't actually do that. Which explains why things work. Linus