On Tue, May 1, 2018 at 9:00 PM Dan Williams <dan.j.willi...@intel.com>
wrote:
> >
> > I  have some dim memory of "rep movs doesn't work well for pmem", but
does
> > it *seriously* need unrolling to cacheline boundaries? And if it does,
who
> > designed it, and why is anybody using it?
> >

> I think this is an FAQ from the original submission, in fact some guy
> named "Linus Torvalds" asked [1]:

Oh, I already mentioned that  I remembered that "rep movs" didn't work well.

But there's a big gap between "just use 'rep movs' and 'do some cacheline
unrollong'".

Why isn't it just doing a simple word-at-a-time loop and letting the CPU do
the unrolling that it will already do on its own?

I may have gotten that answered too, but there's no comment in the code
about why it's such a disgusting mess, so I've long since forgotten _why_
it's such a disgusting mess.

That loop unrolling _used_ to be "hey, it's simple".

Now it's "Hey, that's truly disgusting", with the separate fault handling
for every single case in the unrolled loop.

Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error
labels, and getting the number rof bytes copied right.

And then ask yourself "what if we didn't unroll that thing 8 times, AND WE
COULD GET RID OF ALL OF THOSE?"

Maybe you already did ask yourself.  But I'm asking because it sure isn't
explained in the code.

             Linus

Reply via email to