Re: NFS client/buffer cache deadlock

2005-05-03 Thread Brian Fundakowski Feldman
On Tue, May 03, 2005 at 11:47:00AM +0200, Marc Olzheim wrote:
 On Wed, Apr 27, 2005 at 12:08:57PM -0400, Brian Fundakowski Feldman wrote:
  Alright, this will do synchronous, instead of short, writes (also,
  of course, not deadlock the system) if you are trying to use an
  excessively large buffer size.
  
  http://green.homeunix.org/~green/nfs_client.deadlock.patch
  http://green.homeunix.org/~green/nfs_client.deadlock.HEAD.patch
 
 Will this be incorporated in time for 5.4 ?

It really needs someone else to review the code changes more than just
conceptually to make this kind of an adjustment before release.  It
is not truly an optimal solution, as fully synchronous writes are not
necessary; just limiting the write window size and requiring posted
transactions to complete before queueing up more is.  Doing that is
more error-prone, however, and would I think complicate things just to
optimize the speed of a rare case.

Still, there are probably a few who would object, in which case they
should do the work of optimizing that side case  ;) There's still
missing an actual mount_nfs(8) configuration flag and documentation,
but those things are trivial.

(Forwarded on to -current as well, for additional eyes/testers.)

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
   [EMAIL PROTECTED]   \  The Power to Serve! \
 Opinions expressed are my own.   \,,\
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS client/buffer cache deadlock

2005-04-27 Thread Marc Olzheim
On Tue, Apr 26, 2005 at 03:36:02PM -0400, Brian Fundakowski Feldman wrote:
 I'm still guessing that for whatever reason your writes on the FreeBSD
 4.x NFS client are not using NFSv3/transactions.  The second method
 I just now implemented; it works fine except for being slower since
 all data is acknowledged synchronously.  Are you using one writev()
 instead of many writes so you can atomically write a large sparse data
 structure?  If so, you will probably just have to cope with the lower
 performance than for reasonably-sized writes.  If not: why are you
 trying to write it atomically?  Just use multiple normal-sized write()
 calls.

Yes, a single writev(). Just like in the kern/79207 PR.

It doesn't have to be superfast (why would I use NFS otherwise), just as
long as it's threadsafe / atomic.

  Btw. running the writev program with 20 * 100 MB on UFS on a 512MB
  FreeBSD 6-CURRENT system practicly locks the filesystem down _and_
  causes all processes to be swapped out in favor of the buffer cache.
  'top' however, doesnt' show a rise in BUF usage.
  
  On FreeBSD 4.x, the system performance as usual during the writev to
  UFS.
 
 That's certainly not very optimal.  I don't know anything about it, sorry.

No problem. I just thought that they might be related (this is with a
single writev() as well).

Marc


pgpPPY7FDOsXE.pgp
Description: PGP signature


Re: NFS client/buffer cache deadlock

2005-04-27 Thread Brian Fundakowski Feldman
On Wed, Apr 27, 2005 at 10:17:46AM +0200, Marc Olzheim wrote:
 On Tue, Apr 26, 2005 at 03:36:02PM -0400, Brian Fundakowski Feldman wrote:
  I'm still guessing that for whatever reason your writes on the FreeBSD
  4.x NFS client are not using NFSv3/transactions.  The second method
  I just now implemented; it works fine except for being slower since
  all data is acknowledged synchronously.  Are you using one writev()
  instead of many writes so you can atomically write a large sparse data
  structure?  If so, you will probably just have to cope with the lower
  performance than for reasonably-sized writes.  If not: why are you
  trying to write it atomically?  Just use multiple normal-sized write()
  calls.
 
 Yes, a single writev(). Just like in the kern/79207 PR.
 
 It doesn't have to be superfast (why would I use NFS otherwise), just as
 long as it's threadsafe / atomic.

Alright, this will do synchronous, instead of short, writes (also,
of course, not deadlock the system) if you are trying to use an
excessively large buffer size.

http://green.homeunix.org/~green/nfs_client.deadlock.patch
http://green.homeunix.org/~green/nfs_client.deadlock.HEAD.patch

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
   [EMAIL PROTECTED]   \  The Power to Serve! \
 Opinions expressed are my own.   \,,\
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS client/buffer cache deadlock

2005-04-27 Thread Marc Olzheim
On Wed, Apr 27, 2005 at 12:08:57PM -0400, Brian Fundakowski Feldman wrote:
  Yes, a single writev(). Just like in the kern/79207 PR.
  
  It doesn't have to be superfast (why would I use NFS otherwise), just as
  long as it's threadsafe / atomic.
 
 Alright, this will do synchronous, instead of short, writes (also,
 of course, not deadlock the system) if you are trying to use an
 excessively large buffer size.
 
 http://green.homeunix.org/~green/nfs_client.deadlock.patch
 http://green.homeunix.org/~green/nfs_client.deadlock.HEAD.patch

Great! This seems to do the trick and isn't that slow (about 2.8 MB/sec
over 100 MBit, writing 600 * 1MB, 4.x gets about 5.5 MB/sec on the same
system); it's fast enough for me and more importantly, it doesn't lock
the system down anymore. ;-)

Thanks a lot!

Marc


pgpduYilPphWu.pgp
Description: PGP signature


Re: NFS client/buffer cache deadlock

2005-04-27 Thread Brian Fundakowski Feldman
On Wed, Apr 27, 2005 at 07:15:23PM +0200, Marc Olzheim wrote:
 On Wed, Apr 27, 2005 at 12:08:57PM -0400, Brian Fundakowski Feldman wrote:
   Yes, a single writev(). Just like in the kern/79207 PR.
   
   It doesn't have to be superfast (why would I use NFS otherwise), just as
   long as it's threadsafe / atomic.
  
  Alright, this will do synchronous, instead of short, writes (also,
  of course, not deadlock the system) if you are trying to use an
  excessively large buffer size.
  
  http://green.homeunix.org/~green/nfs_client.deadlock.patch
  http://green.homeunix.org/~green/nfs_client.deadlock.HEAD.patch
 
 Great! This seems to do the trick and isn't that slow (about 2.8 MB/sec
 over 100 MBit, writing 600 * 1MB, 4.x gets about 5.5 MB/sec on the same
 system); it's fast enough for me and more importantly, it doesn't lock
 the system down anymore. ;-)
 
 Thanks a lot!

Alright, thanks for helping with this :-) Do you think you can find
a way to tell if in 4.x you're actually using NFSv3/transactions?  I
would really like to know why 4.x isn't deadlocking, and that's the
most plausible explanation I can think of right now.

The behavior could also be totally different due to nfsiod, lack thereof,
or its settings.  Are you running nfsiod on 4.x, and if so, how many?
On 5.x+ the default maximum of nfsiods is 20.

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
   [EMAIL PROTECTED]   \  The Power to Serve! \
 Opinions expressed are my own.   \,,\
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS client/buffer cache deadlock

2005-04-27 Thread Marc Olzheim
On Wed, Apr 27, 2005 at 02:34:35PM -0400, Brian Fundakowski Feldman wrote:
 Alright, thanks for helping with this :-) Do you think you can find
 a way to tell if in 4.x you're actually using NFSv3/transactions?  I
 would really like to know why 4.x isn't deadlocking, and that's the
 most plausible explanation I can think of right now.
 
 The behavior could also be totally different due to nfsiod, lack thereof,
 or its settings.  Are you running nfsiod on 4.x, and if so, how many?
 On 5.x+ the default maximum of nfsiods is 20.

The install I ran the speedtest on is:

FreeBSD baroque.ilse.net 4.10-STABLE FreeBSD 4.10-STABLE #23: Wed Aug  4 
15:18:52 CEST 2004 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/BAROQUE i386

and runs 6 nfsiods.

How can I tell whether it uses transactions ?

The NFS server is mounted with:

mount_nfs -c -P -a 0 -r 8192 -w 8192 -i -o rw,noatime,nointr host:/path 
/nfs/host/path

Marc


pgpuWvUlYAcUc.pgp
Description: PGP signature


Re: NFS client/buffer cache deadlock

2005-04-27 Thread Brian Fundakowski Feldman
On Wed, Apr 27, 2005 at 08:42:03PM +0200, Marc Olzheim wrote:
 On Wed, Apr 27, 2005 at 02:34:35PM -0400, Brian Fundakowski Feldman wrote:
  Alright, thanks for helping with this :-) Do you think you can find
  a way to tell if in 4.x you're actually using NFSv3/transactions?  I
  would really like to know why 4.x isn't deadlocking, and that's the
  most plausible explanation I can think of right now.
  
  The behavior could also be totally different due to nfsiod, lack thereof,
  or its settings.  Are you running nfsiod on 4.x, and if so, how many?
  On 5.x+ the default maximum of nfsiods is 20.
 
 The install I ran the speedtest on is:
 
 FreeBSD baroque.ilse.net 4.10-STABLE FreeBSD 4.10-STABLE #23: Wed Aug  4 
 15:18:52 CEST 2004 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/BAROQUE i386
 
 and runs 6 nfsiods.
 
 How can I tell whether it uses transactions ?

I am not sure -- it should with NFSv3 though.  Does mount -v tell
you anything more detailed?  I suppose that nfsstat may also be used
-- the commit count should only increase when it does an NFSv3
commit operation.

 The NFS server is mounted with:
 
 mount_nfs -c -P -a 0 -r 8192 -w 8192 -i -o rw,noatime,nointr host:/path 
 /nfs/host/path

So the only really visible difference is 6 nfsiod versus 20 nfsiod.  I
think if you increased the nfsiod on that 4.x box to something much
higher it would deadlock the same, then, assuming that all else is
equal.

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
   [EMAIL PROTECTED]   \  The Power to Serve! \
 Opinions expressed are my own.   \,,\
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS client/buffer cache deadlock

2005-04-27 Thread Marc Olzheim
On Wed, Apr 27, 2005 at 03:06:27PM -0400, Brian Fundakowski Feldman wrote:
  How can I tell whether it uses transactions ?
 
 I am not sure -- it should with NFSv3 though.  Does mount -v tell
 you anything more detailed?  I suppose that nfsstat may also be used
 -- the commit count should only increase when it does an NFSv3
 commit operation.
 
  The NFS server is mounted with:
  
  mount_nfs -c -P -a 0 -r 8192 -w 8192 -i -o rw,noatime,nointr host:/path 
  /nfs/host/path
 
 So the only really visible difference is 6 nfsiod versus 20 nfsiod.  I
 think if you increased the nfsiod on that 4.x box to something much
 higher it would deadlock the same, then, assuming that all else is
 equal.

Well, I took on that challenge and although it seems that the machine
completely hangs itself up for a while, it snaps out of it after it is
done.

tehethereal tells me it is NFSv3, and the commits increase:

baroque:/se3/spiderbaknfsstat 
Client Info:
Rpc Counts:
  Getattr   SetattrLookup  Readlink  Read WriteCreateRemove
15272 20119   3401842 82797 548076917594394 43551 42697
   Rename  Link   Symlink Mkdir Rmdir   Readdir  RdirPlusAccess
1307727 11573  2957  2959 17178 0  24938388
MknodFsstatFsinfo  PathConfCommitGLeaseVacate Evict
0 6178814 0191593 0 0 0
Rpc Info:
 TimedOut   Invalid X Replies   Retries  Requests
0 0210711330710 577517143
Cache Info:
Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW HitsMisses
376190541  24213879  94021997   3401840 325304623 545060579699854594394
BioRLHitsMisses BioD HitsMisses DirE HitsMisses
  3051125 82797 88340 17145 42013 0

Server Info:
  Getattr   SetattrLookup  Readlink  Read WriteCreateRemove
0 0 0 0 0 0 0 0
   Rename  Link   Symlink Mkdir Rmdir   Readdir  RdirPlusAccess
0 0 0 0 0 0 0 0
MknodFsstatFsinfo  PathConfCommitGLeaseVacate Evict
0 0 0 0 0 0 0 0
Server Ret-Failed
0
Server Faults
0
Server Cache Stats:
   Inprog  Idem  Non-idemMisses
0 0 0 0
Server Lease Stats:
   Leases PeakL   GLeases
0 0 0
Server Write Gathering:
 WriteOps  WriteRPC   Opsaved
0 0 0
baroque:/se3/spiderbak/home/marcolz/src/writev 100 foo0
nfsstat
baroque:/se3/spiderbaknfsstat
Client Info:
Rpc Counts:
  Getattr   SetattrLookup  Readlink  Read WriteCreateRemove
15272 20119   3401879 82797 548082677607194 43552 42697
   Rename  Link   Symlink Mkdir Rmdir   Readdir  RdirPlusAccess
1307727 11573  2957  2959 17178 0  24938698
MknodFsstatFsinfo  PathConfCommitGLeaseVacate Evict
0 6178814 0193252 0 0 0
Rpc Info:
 TimedOut   Invalid X Replies   Retries  Requests
0 0210711331410 577537710
Cache Info:
Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW HitsMisses
376195207  24214181  94022936   3401877 325306477 545066339699854607194
BioRLHitsMisses BioD HitsMisses DirE HitsMisses
  3051125 82797 88340 17145 42013 0

Server Info:
  Getattr   SetattrLookup  Readlink  Read WriteCreateRemove
0 0 0 0 0 0 0 0
   Rename  Link   Symlink Mkdir Rmdir   Readdir  RdirPlusAccess
0 0 0 0 0 0 0 0
MknodFsstatFsinfo  PathConfCommitGLeaseVacate Evict
0 0 0 0 0 0 0 0
Server Ret-Failed
0
Server Faults
0
Server Cache Stats:
   Inprog  Idem  Non-idemMisses
0 0 0 0
Server Lease Stats:
   Leases PeakL   GLeases
0 0 0
Server Write Gathering:
 WriteOps  WriteRPC   Opsaved
0 0 0
baroque:/se3/spiderbak

Marc


pgpcSN102kK5N.pgp
Description: PGP signature


[5.3Rp6] UFS buffer cache deadlock (was: NFS client/buffer cache deadlock)

2005-04-27 Thread Marc Olzheim
On Wed, Apr 27, 2005 at 10:17:46AM +0200, Marc Olzheim wrote:
   Btw. running the writev program with 20 * 100 MB on UFS on a 512MB
   FreeBSD 6-CURRENT system practicly locks the filesystem down _and_
   causes all processes to be swapped out in favor of the buffer cache.
   'top' however, doesnt' show a rise in BUF usage.
   
   On FreeBSD 4.x, the system performance as usual during the writev to
   UFS.
  
  That's certainly not very optimal.  I don't know anything about it, sorry.
 
 No problem. I just thought that they might be related (this is with a
 single writev() as well).

Hmm, doing the 20 * 100MB on a 5.3-RELEASE-p6 system with 256 MB memory
and 512MB swap (on UFS) hangs the machine. Luckily, I cannot reproduce
it on 5.4-STABLE :-/

Marc


pgpGAaQdYl2fa.pgp
Description: PGP signature


Re: NFS client/buffer cache deadlock

2005-04-27 Thread Brian Fundakowski Feldman
On Wed, Apr 27, 2005 at 09:19:38PM +0200, Marc Olzheim wrote:
 On Wed, Apr 27, 2005 at 03:06:27PM -0400, Brian Fundakowski Feldman wrote:
   How can I tell whether it uses transactions ?
  
  I am not sure -- it should with NFSv3 though.  Does mount -v tell
  you anything more detailed?  I suppose that nfsstat may also be used
  -- the commit count should only increase when it does an NFSv3
  commit operation.
  
   The NFS server is mounted with:
   
   mount_nfs -c -P -a 0 -r 8192 -w 8192 -i -o rw,noatime,nointr host:/path 
   /nfs/host/path
  
  So the only really visible difference is 6 nfsiod versus 20 nfsiod.  I
  think if you increased the nfsiod on that 4.x box to something much
  higher it would deadlock the same, then, assuming that all else is
  equal.
 
 Well, I took on that challenge and although it seems that the machine
 completely hangs itself up for a while, it snaps out of it after it is
 done.
 
 tehethereal tells me it is NFSv3, and the commits increase:
 []

Alright, thanks for testing.  So the same bug exists in 4.x but may
not actually deadlock the buffer cache in the same circumstances.
If you feel like playing with it anymore, I would expect a certain
write block size, maximum nfsiod, and total file size being written
wouldn't be too hard to find that would deadlock 4.x, too.

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
   [EMAIL PROTECTED]   \  The Power to Serve! \
 Opinions expressed are my own.   \,,\
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS client/buffer cache deadlock

2005-04-27 Thread Marc Olzheim
On Wed, Apr 27, 2005 at 03:41:44PM -0400, Brian Fundakowski Feldman wrote:
 Alright, thanks for testing.  So the same bug exists in 4.x but may
 not actually deadlock the buffer cache in the same circumstances.
 If you feel like playing with it anymore, I would expect a certain
 write block size, maximum nfsiod, and total file size being written
 wouldn't be too hard to find that would deadlock 4.x, too.

20 is the maximum number it is willing to start. 21 and above it tells
me:

baroque:~# nfsiod -n 21  
nfsiod: nfsiod count 21; reset to 1
baroque:~#

:-P

Btw.: CPU time was nicely divided amoungst all of the nfsiod's when
doing the writev:

root87373  0.0  0.0   212   32  ??  I 9:11PM   0:00.18 nfsiod -n 20
root87372  0.0  0.0   212   32  ??  I 9:11PM   0:00.16 nfsiod -n 20
root87371  0.0  0.0   212   32  ??  I 9:11PM   0:00.16 nfsiod -n 20
root87370  0.0  0.0   212   32  ??  I 9:11PM   0:00.18 nfsiod -n 20
root87369  0.0  0.0   212   32  ??  I 9:11PM   0:00.18 nfsiod -n 20
root87368  0.0  0.0   212   32  ??  I 9:11PM   0:00.19 nfsiod -n 20
root87367  0.0  0.0   212   32  ??  I 9:11PM   0:00.18 nfsiod -n 20
root87366  0.0  0.0   212   32  ??  I 9:11PM   0:00.18 nfsiod -n 20
root87365  0.0  0.0   212   32  ??  I 9:11PM   0:00.17 nfsiod -n 20
root87364  0.0  0.0   212   32  ??  I 9:11PM   0:00.17 nfsiod -n 20
root87363  0.0  0.0   212   32  ??  I 9:11PM   0:00.19 nfsiod -n 20
root87362  0.0  0.0   212   32  ??  I 9:11PM   0:00.19 nfsiod -n 20
root87361  0.0  0.0   212   32  ??  I 9:11PM   0:00.20 nfsiod -n 20
root87360  0.0  0.0   212   32  ??  I 9:11PM   0:00.20 nfsiod -n 20
root87359  0.0  0.0   212   32  ??  I 9:11PM   0:00.20 nfsiod -n 20
root87358  0.0  0.0   212   32  ??  I 9:11PM   0:00.19 nfsiod -n 20
root87357  0.0  0.0   212   32  ??  I 9:11PM   0:00.19 nfsiod -n 20
root87356  0.0  0.0   212   32  ??  I 9:11PM   0:00.21 nfsiod -n 20
root87355  0.0  0.0   212   32  ??  I 9:11PM   0:00.21 nfsiod -n 20
root87354  0.0  0.0   212   32  ??  I 9:11PM   0:00.23 nfsiod -n 20

Marc


pgpn3e0q3lZHO.pgp
Description: PGP signature


Re: NFS client/buffer cache deadlock

2005-04-26 Thread Marc Olzheim
[changed cc: from standards@ back to stable@ again.]

On Tue, Apr 26, 2005 at 12:25:49PM -0400, Brian Fundakowski Feldman wrote:
 You can assure that this happens in only two ways:
 
 1. Make a complete copy of the data.  This is what currently occurs:
it gets stuffed into the buffer cache as the write happens.
 2. Keep the data around synchronously -- by virtue of the write system
call being used synchronously, the thread's VM context is around,
and duplication need not occur.

It seems as though FreeBSD 4.x either used 2) or does something wrong
indeed. Why would 2) be a problem on FreeBSD 5.x ? Can't the pages
written from be locked during the write, instead of copied internally ?

Btw. running the writev program with 20 * 100 MB on UFS on a 512MB
FreeBSD 6-CURRENT system practicly locks the filesystem down _and_
causes all processes to be swapped out in favor of the buffer cache.
'top' however, doesnt' show a rise in BUF usage.

On FreeBSD 4.x, the system performance as usual during the writev to
UFS.

Marc


pgple5KkUSnn9.pgp
Description: PGP signature


Re: NFS client/buffer cache deadlock

2005-04-26 Thread Brian Fundakowski Feldman
On Tue, Apr 26, 2005 at 06:43:46PM +0200, Marc Olzheim wrote:
 [changed cc: from standards@ back to stable@ again.]
 
 On Tue, Apr 26, 2005 at 12:25:49PM -0400, Brian Fundakowski Feldman wrote:
  You can assure that this happens in only two ways:
  
  1. Make a complete copy of the data.  This is what currently occurs:
 it gets stuffed into the buffer cache as the write happens.
  2. Keep the data around synchronously -- by virtue of the write system
 call being used synchronously, the thread's VM context is around,
 and duplication need not occur.
 
 It seems as though FreeBSD 4.x either used 2) or does something wrong
 indeed. Why would 2) be a problem on FreeBSD 5.x ? Can't the pages
 written from be locked during the write, instead of copied internally ?

I'm still guessing that for whatever reason your writes on the FreeBSD
4.x NFS client are not using NFSv3/transactions.  The second method
I just now implemented; it works fine except for being slower since
all data is acknowledged synchronously.  Are you using one writev()
instead of many writes so you can atomically write a large sparse data
structure?  If so, you will probably just have to cope with the lower
performance than for reasonably-sized writes.  If not: why are you
trying to write it atomically?  Just use multiple normal-sized write()
calls.

 Btw. running the writev program with 20 * 100 MB on UFS on a 512MB
 FreeBSD 6-CURRENT system practicly locks the filesystem down _and_
 causes all processes to be swapped out in favor of the buffer cache.
 'top' however, doesnt' show a rise in BUF usage.
 
 On FreeBSD 4.x, the system performance as usual during the writev to
 UFS.

That's certainly not very optimal.  I don't know anything about it, sorry.

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
   [EMAIL PROTECTED]   \  The Power to Serve! \
 Opinions expressed are my own.   \,,\
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]