On Tue, May 1, 2018 at 9:14 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > On Tue, May 1, 2018 at 9:00 PM Dan Williams <dan.j.willi...@intel.com> > wrote: >> > >> > I have some dim memory of "rep movs doesn't work well for pmem", but > does >> > it *seriously* need unrolling to cacheline boundaries? And if it does, > who >> > designed it, and why is anybody using it? >> > > >> I think this is an FAQ from the original submission, in fact some guy >> named "Linus Torvalds" asked [1]: > > Oh, I already mentioned that I remembered that "rep movs" didn't work well. > > But there's a big gap between "just use 'rep movs' and 'do some cacheline > unrollong'". > > Why isn't it just doing a simple word-at-a-time loop and letting the CPU do > the unrolling that it will already do on its own? > > I may have gotten that answered too, but there's no comment in the code > about why it's such a disgusting mess, so I've long since forgotten _why_ > it's such a disgusting mess. > > That loop unrolling _used_ to be "hey, it's simple". > > Now it's "Hey, that's truly disgusting", with the separate fault handling > for every single case in the unrolled loop. > > Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error > labels, and getting the number rof bytes copied right. > > And then ask yourself "what if we didn't unroll that thing 8 times, AND WE > COULD GET RID OF ALL OF THOSE?" > > Maybe you already did ask yourself. But I'm asking because it sure isn't > explained in the code.
Ah, sorry. Yeah, I don't see a good reason to keep the unrolling. It would definitely clean up the fault handling, I'll respin.