On Wed, 30 May 2018, Jeff Moyer wrote:

> Dan Williams <dan.j.willi...@intel.com> writes:
> 
> > When I read your patch I came away with the impression that ARM had
> > not added memcpy_flushcache() yet and you were working around that
> > fact. Now that I look, ARM *does* define memcpy_flushcache() and
> > you're avoiding it. You use memcpy+arch_wb_pmem where arch_wb_pmem on
> > ARM64 is defined as __clean_dcache_area_pop(dst, cnt). The ARM
> > memcpy_flushcache() implementation is:
> >
> >     memcpy(dst, src, cnt);
> >     __clean_dcache_area_pop(dst, cnt);
> >
> > So, I do not see how what you're doing is any less work unless you are
> > flushing less than you copy?
> >
> > If memcpy_flushcache() is slower than memcpy + arch_wb_pmem then the
> > ARM implementation is broken and that needs to be addressed not worked
> > around in a driver.
> 
> I think Mikulas wanted to batch up multiple copies and flush at the
> end.  According to his commit message, that batching gained him 2%
> performance.
> 
> -Jeff

No - this 2% difference is inlined memcpy_flushcache() vs out-of-line 
memcpy_flushcache().

I thought that dax_flush() performed 12% better memcpy_flushcache() - but 
the reason why it performed better was - that it was not flushing the 
cache at all.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to