Re: [PATCH] nfsd + scalability

2001-02-22 Thread Neil Brown

On Thursday February 22, [EMAIL PROTECTED] wrote:
> On Thu, 22 Feb 2001, Neil Brown wrote:
> > 
> > Certainly I would like to not hold the BKL so much, but I'm curious
> > how much effect it will really have.  Do you have any data on the
> > effect of this change?
> 
>   Depends very much on the hardware configuration, and underlying
> filesystem.
>   On an I/O bound system, obviously this has little effect.
> 
>   Using a filesystem which has a quite deep stack (CPU cycle heavy),
> this is a big win.  I've been running with this for so long that I can't
> find my original data files at the moment, but it was around +8%
> improvment in throughput for a 4-way box under SpecFS with vxfs as the
> underlying filesystem.  Less benefit for ext2 (all filesystems NFS
> exported "sync" and "no_subtree_check").  Some of the benefit came from
> the fact that there is also a volume manager sitting under the filesystem
> (more CPU cycles with the kernel lock held!).
    and more 

Well, you are quite convincing.  I am keen to see if I can measure a
performance difference for ext2/raid5, but unfortuantely I don't have
any 4-way boxed (only duals).

If you progress with this can create any patches that do more
interesting things than just drop the lock (e.g. protect the export
table or the read-ahead cache properly), let me know and I will
certainly take them.  Meanwhile I will try to find time to have a look
myself.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] nfsd + scalability

2001-02-22 Thread Mark Hemment

On Thu, 22 Feb 2001, Neil Brown wrote:

> On Sunday February 18, [EMAIL PROTECTED] wrote:
> > Hi Neil, all,
> > 
> >   The nfs daemons run holding the global kernel lock.  They still hold
> > this lock over calls to file_op's read and write.
> > [snip]
> >   Dropping the kernel lock around read and write in fs/nfsd/vfs.c is a
> > _big_ SMP scalability win!
> 
> Certainly I would like to not hold the BKL so much, but I'm curious
> how much effect it will really have.  Do you have any data on the
> effect of this change?

  Depends very much on the hardware configuration, and underlying
filesystem.
  On an I/O bound system, obviously this has little effect.

  Using a filesystem which has a quite deep stack (CPU cycle heavy),
this is a big win.  I've been running with this for so long that I can't
find my original data files at the moment, but it was around +8%
improvment in throughput for a 4-way box under SpecFS with vxfs as the
underlying filesystem.  Less benefit for ext2 (all filesystems NFS
exported "sync" and "no_subtree_check").  Some of the benefit came from
the fact that there is also a volume manager sitting under the filesystem
(more CPU cycles with the kernel lock held!).

  Holding the kernel lock for less cycles has an important side benefit!
  If it is held for less cycles, then the probability of it being held
when an interrupt is processed is reduced.  This benefit is further
enhanced if there are bottom-halfs running off the back of the interrupt
(such as networking code).
  I need to get actual figures for how many cycles are we spinning at the
reacquire_kernel_lock() (in schedule()), but my gut feeling is that we
aren't doing too well when running as a file server.

> Also, I would much rather drop the lock in nfsd_dispatch() and then go
> around reclaiming it whereever it was needed.
> Subsequently we would drop the lock in nfsd() and make sure sunrpc is
> safe.

  sunrpc definitely tries to be SMP safe, I haven't convienced myself that
it actually is.
  In sunrpc, dropping the kernel lock when checksuming the UDP packet, and
when sending, is another win-win case (again, need to find my data files,
but this was approx +3% improvement in throughput).

> This would be more work (any volunteers?:-) but I feel that dropping
> it in little places like this is a bit unsatisfying.

  True, not 100% satisfying, but even little places give an overall
improvement in  scalability.  If they can be proved to be correct, then
why not do it?  It moves the code in the right direction.

  I am planning on slow expanding the dropping of the kernel lock within
NFS, rather than do it in one single bang.  It looks do able, but there
_might_ be an issue with the interaction with the dcache.

Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] nfsd + scalability

2001-02-22 Thread Mark Hemment

On Thu, 22 Feb 2001, Neil Brown wrote:

 On Sunday February 18, [EMAIL PROTECTED] wrote:
  Hi Neil, all,
  
The nfs daemons run holding the global kernel lock.  They still hold
  this lock over calls to file_op's read and write.
  [snip]
Dropping the kernel lock around read and write in fs/nfsd/vfs.c is a
  _big_ SMP scalability win!
 
 Certainly I would like to not hold the BKL so much, but I'm curious
 how much effect it will really have.  Do you have any data on the
 effect of this change?

  Depends very much on the hardware configuration, and underlying
filesystem.
  On an I/O bound system, obviously this has little effect.

  Using a filesystem which has a quite deep stack (CPU cycle heavy),
this is a big win.  I've been running with this for so long that I can't
find my original data files at the moment, but it was around +8%
improvment in throughput for a 4-way box under SpecFS with vxfs as the
underlying filesystem.  Less benefit for ext2 (all filesystems NFS
exported "sync" and "no_subtree_check").  Some of the benefit came from
the fact that there is also a volume manager sitting under the filesystem
(more CPU cycles with the kernel lock held!).

  Holding the kernel lock for less cycles has an important side benefit!
  If it is held for less cycles, then the probability of it being held
when an interrupt is processed is reduced.  This benefit is further
enhanced if there are bottom-halfs running off the back of the interrupt
(such as networking code).
  I need to get actual figures for how many cycles are we spinning at the
reacquire_kernel_lock() (in schedule()), but my gut feeling is that we
aren't doing too well when running as a file server.

 Also, I would much rather drop the lock in nfsd_dispatch() and then go
 around reclaiming it whereever it was needed.
 Subsequently we would drop the lock in nfsd() and make sure sunrpc is
 safe.

  sunrpc definitely tries to be SMP safe, I haven't convienced myself that
it actually is.
  In sunrpc, dropping the kernel lock when checksuming the UDP packet, and
when sending, is another win-win case (again, need to find my data files,
but this was approx +3% improvement in throughput).

 This would be more work (any volunteers?:-) but I feel that dropping
 it in little places like this is a bit unsatisfying.

  True, not 100% satisfying, but even little places give an overall
improvement in  scalability.  If they can be proved to be correct, then
why not do it?  It moves the code in the right direction.

  I am planning on slow expanding the dropping of the kernel lock within
NFS, rather than do it in one single bang.  It looks do able, but there
_might_ be an issue with the interaction with the dcache.

Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] nfsd + scalability

2001-02-22 Thread Neil Brown

On Thursday February 22, [EMAIL PROTECTED] wrote:
 On Thu, 22 Feb 2001, Neil Brown wrote:
  
  Certainly I would like to not hold the BKL so much, but I'm curious
  how much effect it will really have.  Do you have any data on the
  effect of this change?
 
   Depends very much on the hardware configuration, and underlying
 filesystem.
   On an I/O bound system, obviously this has little effect.
 
   Using a filesystem which has a quite deep stack (CPU cycle heavy),
 this is a big win.  I've been running with this for so long that I can't
 find my original data files at the moment, but it was around +8%
 improvment in throughput for a 4-way box under SpecFS with vxfs as the
 underlying filesystem.  Less benefit for ext2 (all filesystems NFS
 exported "sync" and "no_subtree_check").  Some of the benefit came from
 the fact that there is also a volume manager sitting under the filesystem
 (more CPU cycles with the kernel lock held!).
    and more 

Well, you are quite convincing.  I am keen to see if I can measure a
performance difference for ext2/raid5, but unfortuantely I don't have
any 4-way boxed (only duals).

If you progress with this can create any patches that do more
interesting things than just drop the lock (e.g. protect the export
table or the read-ahead cache properly), let me know and I will
certainly take them.  Meanwhile I will try to find time to have a look
myself.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] nfsd + scalability

2001-02-21 Thread Neil Brown

On Sunday February 18, [EMAIL PROTECTED] wrote:
> Hi Neil, all,
> 
>   The nfs daemons run holding the global kernel lock.  They still hold
> this lock over calls to file_op's read and write.
> 
>   The file system kernel interface (FSKI) doesn't require the kernel lock
> to be held over these read/write calls.  The nfs daemons do not require 
> that the reads or writes do not block (would be v silly if they did), so
> they have no guarantee the lock isn't dropped and retaken during
> blocking.  ie. they aren't using it as a guard across the calls.
> 
>   Dropping the kernel lock around read and write in fs/nfsd/vfs.c is a
> _big_ SMP scalability win!

Certainly I would like to not hold the BKL so much, but I'm curious
how much effect it will really have.  Do you have any data on the
effect of this change?

Also, I would much rather drop the lock in nfsd_dispatch() and then go
around reclaiming it whereever it was needed.
Subsequently we would drop the lock in nfsd() and make sure sunrpc is
safe. 
This would be more work (any volunteers?:-) but I feel that dropping
it in little places like this is a bit unsatisfying.

Thanks for the input,

NeilBrown


[patch deleted]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] nfsd + scalability

2001-02-21 Thread Neil Brown

On Sunday February 18, [EMAIL PROTECTED] wrote:
 Hi Neil, all,
 
   The nfs daemons run holding the global kernel lock.  They still hold
 this lock over calls to file_op's read and write.
 
   The file system kernel interface (FSKI) doesn't require the kernel lock
 to be held over these read/write calls.  The nfs daemons do not require 
 that the reads or writes do not block (would be v silly if they did), so
 they have no guarantee the lock isn't dropped and retaken during
 blocking.  ie. they aren't using it as a guard across the calls.
 
   Dropping the kernel lock around read and write in fs/nfsd/vfs.c is a
 _big_ SMP scalability win!

Certainly I would like to not hold the BKL so much, but I'm curious
how much effect it will really have.  Do you have any data on the
effect of this change?

Also, I would much rather drop the lock in nfsd_dispatch() and then go
around reclaiming it whereever it was needed.
Subsequently we would drop the lock in nfsd() and make sure sunrpc is
safe. 
This would be more work (any volunteers?:-) but I feel that dropping
it in little places like this is a bit unsatisfying.

Thanks for the input,

NeilBrown


[patch deleted]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] nfsd + scalability

2001-02-18 Thread Mark Hemment

Hi Neil, all,

  The nfs daemons run holding the global kernel lock.  They still hold
this lock over calls to file_op's read and write.

  The file system kernel interface (FSKI) doesn't require the kernel lock
to be held over these read/write calls.  The nfs daemons do not require 
that the reads or writes do not block (would be v silly if they did), so
they have no guarantee the lock isn't dropped and retaken during
blocking.  ie. they aren't using it as a guard across the calls.

  Dropping the kernel lock around read and write in fs/nfsd/vfs.c is a
_big_ SMP scalability win!

  Attached patch is against 2.4.1-ac18, but should apply to most recent
kernel versions.

Mark


--- vanilla-2.4.1-ac18/fs/nfsd/vfs.cSun Feb 18 15:06:27 2001
+++ markhe-2.4.1-ac18/fs/nfsd/vfs.c Sun Feb 18 19:32:18 2001
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #define __NO_VERSION__
 #include 
@@ -602,12 +603,28 @@
file.f_ralen = ra->p_ralen;
file.f_rawin = ra->p_rawin;
}
+
+   /*
+* The nfs daemons run holding the global kernel lock, but
+* f_op->read() doesn't need the lock to be held.
+* Drop it here to help scalability.
+*
+* The "kernel_locked()" test isn't perfect (someone else could be
+* holding the lock when we're not), but it will eventually catch
+* any cases of entering here without the lock held.
+*/
+   if (!kernel_locked())
+   BUG();
+   unlock_kernel();
+
file.f_pos = offset;
 
oldfs = get_fs(); set_fs(KERNEL_DS);
err = file.f_op->read(, buf, *count, _pos);
set_fs(oldfs);
 
+   lock_kernel();
+
/* Write back readahead params */
if (ra != NULL) {
dprintk("nfsd: raparms %ld %ld %ld %ld %ld\n",
@@ -664,6 +681,22 @@
goto out_close;
 #endif
 
+   /*
+* The nfs daemons run holding the global kernel lock, but
+* f_op->write() doesn't need the lock to be held.
+* Also, as the struct file is private, the export is read-locked,
+* and the inode attached to the dentry cannot change under us, the
+* lock can be dropped ahead of the call to write() for even better
+* scalability.
+*
+* The "kernel_locked()" test isn't perfect (someone else could be
+* holding the lock when we're not), but it will eventually catch
+* any cases of entering here without the lock held.
+*/
+   if (!kernel_locked())
+   BUG();
+   unlock_kernel();
+
dentry = file.f_dentry;
inode = dentry->d_inode;
exp   = fhp->fh_export;
@@ -692,9 +725,12 @@
/* Write the data. */
oldfs = get_fs(); set_fs(KERNEL_DS);
err = file.f_op->write(, buf, cnt, _pos);
+   set_fs(oldfs);
+
+   lock_kernel();
+
if (err >= 0)
nfsdstats.io_write += cnt;
-   set_fs(oldfs);
 
/* clear setuid/setgid flag after write */
if (err >= 0 && (inode->i_mode & (S_ISUID | S_ISGID))) {



[PATCH] nfsd + scalability

2001-02-18 Thread Mark Hemment

Hi Neil, all,

  The nfs daemons run holding the global kernel lock.  They still hold
this lock over calls to file_op's read and write.

  The file system kernel interface (FSKI) doesn't require the kernel lock
to be held over these read/write calls.  The nfs daemons do not require 
that the reads or writes do not block (would be v silly if they did), so
they have no guarantee the lock isn't dropped and retaken during
blocking.  ie. they aren't using it as a guard across the calls.

  Dropping the kernel lock around read and write in fs/nfsd/vfs.c is a
_big_ SMP scalability win!

  Attached patch is against 2.4.1-ac18, but should apply to most recent
kernel versions.

Mark


--- vanilla-2.4.1-ac18/fs/nfsd/vfs.cSun Feb 18 15:06:27 2001
+++ markhe-2.4.1-ac18/fs/nfsd/vfs.c Sun Feb 18 19:32:18 2001
@@ -30,6 +30,7 @@
 #include linux/net.h
 #include linux/unistd.h
 #include linux/slab.h
+#include linux/smp_lock.h
 #include linux/in.h
 #define __NO_VERSION__
 #include linux/module.h
@@ -602,12 +603,28 @@
file.f_ralen = ra-p_ralen;
file.f_rawin = ra-p_rawin;
}
+
+   /*
+* The nfs daemons run holding the global kernel lock, but
+* f_op-read() doesn't need the lock to be held.
+* Drop it here to help scalability.
+*
+* The "kernel_locked()" test isn't perfect (someone else could be
+* holding the lock when we're not), but it will eventually catch
+* any cases of entering here without the lock held.
+*/
+   if (!kernel_locked())
+   BUG();
+   unlock_kernel();
+
file.f_pos = offset;
 
oldfs = get_fs(); set_fs(KERNEL_DS);
err = file.f_op-read(file, buf, *count, file.f_pos);
set_fs(oldfs);
 
+   lock_kernel();
+
/* Write back readahead params */
if (ra != NULL) {
dprintk("nfsd: raparms %ld %ld %ld %ld %ld\n",
@@ -664,6 +681,22 @@
goto out_close;
 #endif
 
+   /*
+* The nfs daemons run holding the global kernel lock, but
+* f_op-write() doesn't need the lock to be held.
+* Also, as the struct file is private, the export is read-locked,
+* and the inode attached to the dentry cannot change under us, the
+* lock can be dropped ahead of the call to write() for even better
+* scalability.
+*
+* The "kernel_locked()" test isn't perfect (someone else could be
+* holding the lock when we're not), but it will eventually catch
+* any cases of entering here without the lock held.
+*/
+   if (!kernel_locked())
+   BUG();
+   unlock_kernel();
+
dentry = file.f_dentry;
inode = dentry-d_inode;
exp   = fhp-fh_export;
@@ -692,9 +725,12 @@
/* Write the data. */
oldfs = get_fs(); set_fs(KERNEL_DS);
err = file.f_op-write(file, buf, cnt, file.f_pos);
+   set_fs(oldfs);
+
+   lock_kernel();
+
if (err = 0)
nfsdstats.io_write += cnt;
-   set_fs(oldfs);
 
/* clear setuid/setgid flag after write */
if (err = 0  (inode-i_mode  (S_ISUID | S_ISGID))) {