On Tue, May 1, 2018 at 9:00 PM Dan Williams <dan.j.willi...@intel.com> wrote: > > > > I have some dim memory of "rep movs doesn't work well for pmem", but does > > it *seriously* need unrolling to cacheline boundaries? And if it does, who > > designed it, and why is anybody using it? > >
> I think this is an FAQ from the original submission, in fact some guy > named "Linus Torvalds" asked [1]: Oh, I already mentioned that I remembered that "rep movs" didn't work well. But there's a big gap between "just use 'rep movs' and 'do some cacheline unrollong'". Why isn't it just doing a simple word-at-a-time loop and letting the CPU do the unrolling that it will already do on its own? I may have gotten that answered too, but there's no comment in the code about why it's such a disgusting mess, so I've long since forgotten _why_ it's such a disgusting mess. That loop unrolling _used_ to be "hey, it's simple". Now it's "Hey, that's truly disgusting", with the separate fault handling for every single case in the unrolled loop. Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error labels, and getting the number rof bytes copied right. And then ask yourself "what if we didn't unroll that thing 8 times, AND WE COULD GET RID OF ALL OF THOSE?" Maybe you already did ask yourself. But I'm asking because it sure isn't explained in the code. Linus _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm