Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-15 Thread Christoph Hellwig
On Wed, Oct 12, 2016 at 08:06:40AM +1100, Dave Chinner wrote:
> Um, I seem to have completely missed that change - when did that
> happen and why?
> 
> Oh, it was part of the misguided "enable DAX on block devices"
> changes -

o, that commit just switched it to use ->f_mapping:

-   return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));
+   return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);

The original version of it goes all the way back to introducing the
current-day DAX code in d475c6346 ("dax,ext2: replace XIP read and write
with DAX I/O");

> Hence I'd suggest that DAX check in io_is_direct() should be removed
> ASAP; the filesystems don't need it as they check the inode DAX
> state directly, and the code it "fixed" is no longer in the tree.

As long as ext4 still uses the overloaded direct_IO we need the
checks for DAX in the filemap.c generic read/write code.  It seems
like that's only two spots anyway, but I'd feel much safer once ext4
is switched over to the iomap version of the dax code.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-11 Thread Dave Chinner
On Mon, Oct 10, 2016 at 02:34:34AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 10, 2016 at 05:07:45PM +1100, Dave Chinner wrote:
> > > > *However*, the DAX IO path locking in XFS  has changed in 4.9-rc1 to
> > > > match the buffered IO single writer POSIX semantics - the test is a
> > > > bad test based on the fact it exercised a path that is under heavy
> > > > development and so can't be used as a regression test across
> > > > multiple kernels.
> > > 
> > > That being said - I wonder if we should allow the shared lock on DAX
> > > files IFF the user is specifying O_DIRECT in the open mode..
> > 
> > It should do - if it doesn't then we screwed up the IO path
> > selection logic in XFS and we'll need to fix it.
> 
> Depends on your defintion of "we".  The DAX code has always abused the
> direct I/O path, and that abuse is ingrained in the VFS path in a way that
> we can't easily undo it in XFS, e.g. take a look at io_is_direct and
> iocb_flags in include/linux/fs.h, which ensure that DAX I/O will always
> appear as IOCB_DIRECT to the fs. 

Um, I seem to have completely missed that change - when did that
happen and why?

Oh, it was part of the misguided "enable DAX on block devices"
changes - it was supposed to fix a problem with block device access
using buffered IO instead of DAX (commit 65f87ee71852 "fs, block:
force direct-I/O for dax-enabled block devices")).  The original
block device DAX code was reverted soon after this because it was
badly flawed, but this change was not removed at the same time
(probably because it was forgotton about).

Hence I'd suggest that DAX check in io_is_direct() should be removed
ASAP; the filesystems don't need it as they check the inode DAX
state directly, and the code it "fixed" is no longer in the tree.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-10 Thread Christoph Hellwig
On Mon, Oct 10, 2016 at 05:07:45PM +1100, Dave Chinner wrote:
> > > *However*, the DAX IO path locking in XFS  has changed in 4.9-rc1 to
> > > match the buffered IO single writer POSIX semantics - the test is a
> > > bad test based on the fact it exercised a path that is under heavy
> > > development and so can't be used as a regression test across
> > > multiple kernels.
> > 
> > That being said - I wonder if we should allow the shared lock on DAX
> > files IFF the user is specifying O_DIRECT in the open mode..
> 
> It should do - if it doesn't then we screwed up the IO path
> selection logic in XFS and we'll need to fix it.

Depends on your defintion of "we".  The DAX code has always abused the
direct I/O path, and that abuse is ingrained in the VFS path in a way that
we can't easily undo it in XFS, e.g. take a look at io_is_direct and
iocb_flags in include/linux/fs.h, which ensure that DAX I/O will always
appear as IOCB_DIRECT to the fs.  It will take some time to untagle
this, but it's on my todo list once the last file system (ext4)
untangles the DAX and direct I/O path.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-10 Thread Dave Chinner
On Sun, Oct 09, 2016 at 08:17:48AM -0700, Christoph Hellwig wrote:
> On Fri, Oct 07, 2016 at 08:47:51AM +1100, Dave Chinner wrote:
> > Except that it's DAX, and in 4.7-rc1 that used shared locking at the
> > XFS level and never took exclusive locks.
> > 
> > *However*, the DAX IO path locking in XFS  has changed in 4.9-rc1 to
> > match the buffered IO single writer POSIX semantics - the test is a
> > bad test based on the fact it exercised a path that is under heavy
> > development and so can't be used as a regression test across
> > multiple kernels.
> 
> That being said - I wonder if we should allow the shared lock on DAX
> files IFF the user is specifying O_DIRECT in the open mode..

It should do - if it doesn't then we screwed up the IO path
selection logic in XFS and we'll need to fix it.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-09 Thread Christoph Hellwig
On Fri, Oct 07, 2016 at 08:47:51AM +1100, Dave Chinner wrote:
> Except that it's DAX, and in 4.7-rc1 that used shared locking at the
> XFS level and never took exclusive locks.
> 
> *However*, the DAX IO path locking in XFS  has changed in 4.9-rc1 to
> match the buffered IO single writer POSIX semantics - the test is a
> bad test based on the fact it exercised a path that is under heavy
> development and so can't be used as a regression test across
> multiple kernels.

That being said - I wonder if we should allow the shared lock on DAX
files IFF the user is specifying O_DIRECT in the open mode..
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-07 Thread Waiman Long

On 10/06/2016 05:47 PM, Dave Chinner wrote:

On Thu, Oct 06, 2016 at 11:17:18AM -0700, Davidlohr Bueso wrote:

On Thu, 18 Aug 2016, Waiman Long wrote:


Currently, when down_read() fails, the active read locking isn't undone
until the rwsem_down_read_failed() function grabs the wait_lock. If the
wait_lock is contended, it may takes a while to get the lock. During
that period, writer lock stealing will be disabled because of the
active read lock.

This patch will release the active read lock ASAP so that writer lock
stealing can happen sooner. The only downside is when the reader is
the first one in the wait queue as it has to issue another atomic
operation to update the count.

On a 4-socket Haswell machine running on a 4.7-rc1 tip-based kernel,
the fio test with multithreaded randrw and randwrite tests on the
same file on a XFS partition on top of a NVDIMM with DAX were run,
the aggregated bandwidths before and after the patch were as follows:

Test  BW before patch BW after patch  % change
  --- --  
randrw1210 MB/s  1352 MB/s  +12%
randwrite 1622 MB/s  1710 MB/s  +5.4%

Yeah, this is really a bad workload to make decisions on locking
heuristics imo - if I'm thinking of the same workload. Mainly because
concurrent buffered io to the same file isn't very realistic and you
end up pathologically pounding on i_rwsem (which used to be until
recently i_mutex until Al's parallel lookup/readdir). Obviously write
lock stealing wins in this case.

Except that it's DAX, and in 4.7-rc1 that used shared locking at the
XFS level and never took exclusive locks.

*However*, the DAX IO path locking in XFS  has changed in 4.9-rc1 to
match the buffered IO single writer POSIX semantics - the test is a
bad test based on the fact it exercised a path that is under heavy
development and so can't be used as a regression test across
multiple kernels.

If you want to stress concurrent access to a single file, please
use direct IO, not DAX or buffered IO.


Thanks for the update. I will change the test when I update this patch.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-06 Thread Davidlohr Bueso

On Fri, 07 Oct 2016, Dave Chinner wrote:


Except that it's DAX


Duh, of course; silly me.

Thanks,
Davidlohr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-06 Thread Dave Chinner
On Thu, Oct 06, 2016 at 11:17:18AM -0700, Davidlohr Bueso wrote:
> On Thu, 18 Aug 2016, Waiman Long wrote:
> 
> >Currently, when down_read() fails, the active read locking isn't undone
> >until the rwsem_down_read_failed() function grabs the wait_lock. If the
> >wait_lock is contended, it may takes a while to get the lock. During
> >that period, writer lock stealing will be disabled because of the
> >active read lock.
> >
> >This patch will release the active read lock ASAP so that writer lock
> >stealing can happen sooner. The only downside is when the reader is
> >the first one in the wait queue as it has to issue another atomic
> >operation to update the count.
> >
> >On a 4-socket Haswell machine running on a 4.7-rc1 tip-based kernel,
> >the fio test with multithreaded randrw and randwrite tests on the
> >same file on a XFS partition on top of a NVDIMM with DAX were run,
> >the aggregated bandwidths before and after the patch were as follows:
> >
> > Test  BW before patch BW after patch  % change
> >   --- --  
> > randrw1210 MB/s  1352 MB/s  +12%
> > randwrite 1622 MB/s  1710 MB/s  +5.4%
> 
> Yeah, this is really a bad workload to make decisions on locking
> heuristics imo - if I'm thinking of the same workload. Mainly because
> concurrent buffered io to the same file isn't very realistic and you
> end up pathologically pounding on i_rwsem (which used to be until
> recently i_mutex until Al's parallel lookup/readdir). Obviously write
> lock stealing wins in this case.

Except that it's DAX, and in 4.7-rc1 that used shared locking at the
XFS level and never took exclusive locks.

*However*, the DAX IO path locking in XFS  has changed in 4.9-rc1 to
match the buffered IO single writer POSIX semantics - the test is a
bad test based on the fact it exercised a path that is under heavy
development and so can't be used as a regression test across
multiple kernels.

If you want to stress concurrent access to a single file, please
use direct IO, not DAX or buffered IO.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-06 Thread Davidlohr Bueso

On Thu, 18 Aug 2016, Waiman Long wrote:


Currently, when down_read() fails, the active read locking isn't undone
until the rwsem_down_read_failed() function grabs the wait_lock. If the
wait_lock is contended, it may takes a while to get the lock. During
that period, writer lock stealing will be disabled because of the
active read lock.

This patch will release the active read lock ASAP so that writer lock
stealing can happen sooner. The only downside is when the reader is
the first one in the wait queue as it has to issue another atomic
operation to update the count.

On a 4-socket Haswell machine running on a 4.7-rc1 tip-based kernel,
the fio test with multithreaded randrw and randwrite tests on the
same file on a XFS partition on top of a NVDIMM with DAX were run,
the aggregated bandwidths before and after the patch were as follows:

 Test  BW before patch BW after patch  % change
   --- --  
 randrw1210 MB/s  1352 MB/s  +12%
 randwrite 1622 MB/s  1710 MB/s  +5.4%


Yeah, this is really a bad workload to make decisions on locking
heuristics imo - if I'm thinking of the same workload. Mainly because
concurrent buffered io to the same file isn't very realistic and you
end up pathologically pounding on i_rwsem (which used to be until
recently i_mutex until Al's parallel lookup/readdir). Obviously write
lock stealing wins in this case.

Thanks,
Davidlohr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-08-18 Thread Waiman Long
Currently, when down_read() fails, the active read locking isn't undone
until the rwsem_down_read_failed() function grabs the wait_lock. If the
wait_lock is contended, it may takes a while to get the lock. During
that period, writer lock stealing will be disabled because of the
active read lock.

This patch will release the active read lock ASAP so that writer lock
stealing can happen sooner. The only downside is when the reader is
the first one in the wait queue as it has to issue another atomic
operation to update the count.

On a 4-socket Haswell machine running on a 4.7-rc1 tip-based kernel,
the fio test with multithreaded randrw and randwrite tests on the
same file on a XFS partition on top of a NVDIMM with DAX were run,
the aggregated bandwidths before and after the patch were as follows:

  Test  BW before patch BW after patch  % change
    --- --  
  randrw1210 MB/s  1352 MB/s  +12%
  randwrite 1622 MB/s  1710 MB/s  +5.4%

The write-only microbench also showed improvement because some read
locking was done by the XFS code.

Signed-off-by: Waiman Long 
---
 kernel/locking/rwsem-xadd.c |   21 +++--
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2337b4b..9309e72 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -222,21 +222,31 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 __visible
 struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 {
-   long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+   long count, adjustment = 0;
struct rwsem_waiter waiter;
struct task_struct *tsk = current;
WAKE_Q(wake_q);
 
+   /*
+* Undo read bias from down_read operation to stop active locking.
+* Doing that after taking the wait_lock may block writer lock
+* stealing for too long impacting system performance.
+*/
+   atomic_long_add(-RWSEM_ACTIVE_READ_BIAS, >count);
+
waiter.task = tsk;
waiter.type = RWSEM_WAITING_FOR_READ;
 
raw_spin_lock_irq(>wait_lock);
if (list_empty(>wait_list))
-   adjustment += RWSEM_WAITING_BIAS;
+   adjustment = RWSEM_WAITING_BIAS;
list_add_tail(, >wait_list);
 
-   /* we're now waiting on the lock, but no longer actively locking */
-   count = atomic_long_add_return(adjustment, >count);
+   /* we're now waiting on the lock */
+   if (adjustment)
+   count = atomic_long_add_return(adjustment, >count);
+   else
+   count = atomic_long_read(>count);
 
/*
 * If there are no active locks, wake the front queued process(es).
@@ -245,8 +255,7 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct 
rw_semaphore *sem)
 * wake our own waiter to join the existing active readers !
 */
if (count == RWSEM_WAITING_BIAS ||
-   (count > RWSEM_WAITING_BIAS &&
-adjustment != -RWSEM_ACTIVE_READ_BIAS))
+   (count > RWSEM_WAITING_BIAS && adjustment))
__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, _q);
 
raw_spin_unlock_irq(>wait_lock);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html