Re: [dax] 23c84eb783: fio.write_bw_MBps -61.6% regression

2019-10-19 Thread Matthew Wilcox
On Fri, Oct 18, 2019 at 04:12:03PM -0700, Dan Williams wrote:
> I've got several reports of v5.3 performance regressions tracking back
> to this change. I instrumented the ndctl "dax.sh" unit test to
> validate that it is getting huge page faults and it always falls back
> to 4K starting with these commits. It looks like the xa_is_internal()
> returns true for any DAX_LOCKED entry.

That's not true today, but I do intend to make it true at some point.
I think we can reclaim three bits from the encoding of a DAX entry,
allowing us to support three more physical bits on a 32-bit system.
Clearly that hasn't been a focus so far.

The plan is ...

DAX_LOCKED -> XA_LOCK_ENTRY (xa_mk_internal(something))

DAX_ZERO_PAGE -> XA_ZERO_ENTRY

DAX_EMPTY goes away.  It's only used in combination with DAX_LOCKED, and
it won't be necessary once DAX_LOCKED has become XA_LOCK_ENTRY.

DAX_PMD essentially stays, but we can encode arbitrary orders using a single
bit rather than just PTE vs PMD.

We may need to encode a size in DAX_LOCKED, or we may be able to get that
information from the XArray.  Anyway, this transformation is about tenth
on my todo list right now, so if someone else wants to take this on ...


Re: [dax] 23c84eb783: fio.write_bw_MBps -61.6% regression

2019-10-18 Thread Dan Williams
On Fri, Oct 18, 2019 at 2:48 AM Jan Kara  wrote:
>
> Hello!
>
> On Fri 18-10-19 16:23:54, kernel test robot wrote:
> > FYI, we noticed a -61.6% regression of fio.write_bw_MBps due to commit:
> >
> >
> > commit: 23c84eb7837514e16d79ed6d849b13745e0ce688 ("dax: Fix missed wakeup 
> > with PMD faults")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> Thanks for report! Please check whether commit 61c30c98ef17 "dax: Fix
> missed wakeup in put_unlocked_entry()" influences the throughput. Because
> without that fix, the identified commit may result in processes sleeping
> unnecessarily long on entry locks.

I've got several reports of v5.3 performance regressions tracking back
to this change. I instrumented the ndctl "dax.sh" unit test to
validate that it is getting huge page faults and it always falls back
to 4K starting with these commits. It looks like the xa_is_internal()
returns true for any DAX_LOCKED entry.