On Wed, 30 May 2018, Jeff Moyer wrote:
> Dan Williams <dan.j.willi...@intel.com> writes:
>
> > When I read your patch I came away with the impression that ARM had
> > not added memcpy_flushcache() yet and you were working around that
> > fact. Now that I look, ARM *does* define memcpy_flushcache() and
> > you're avoiding it. You use memcpy+arch_wb_pmem where arch_wb_pmem on
> > ARM64 is defined as __clean_dcache_area_pop(dst, cnt). The ARM
> > memcpy_flushcache() implementation is:
> >
> > memcpy(dst, src, cnt);
> > __clean_dcache_area_pop(dst, cnt);
> >
> > So, I do not see how what you're doing is any less work unless you are
> > flushing less than you copy?
> >
> > If memcpy_flushcache() is slower than memcpy + arch_wb_pmem then the
> > ARM implementation is broken and that needs to be addressed not worked
> > around in a driver.
>
> I think Mikulas wanted to batch up multiple copies and flush at the
> end. According to his commit message, that batching gained him 2%
> performance.
>
> -Jeff
No - this 2% difference is inlined memcpy_flushcache() vs out-of-line
memcpy_flushcache().
I thought that dax_flush() performed 12% better memcpy_flushcache() - but
the reason why it performed better was - that it was not flushing the
cache at all.
Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm