Re: NFS client/buffer cache deadlock
On Tue, May 03, 2005 at 11:47:00AM +0200, Marc Olzheim wrote: On Wed, Apr 27, 2005 at 12:08:57PM -0400, Brian Fundakowski Feldman wrote: Alright, this will do synchronous, instead of short, writes (also, of course, not deadlock the system) if you are trying to use an excessively large buffer size. http://green.homeunix.org/~green/nfs_client.deadlock.patch http://green.homeunix.org/~green/nfs_client.deadlock.HEAD.patch Will this be incorporated in time for 5.4 ? It really needs someone else to review the code changes more than just conceptually to make this kind of an adjustment before release. It is not truly an optimal solution, as fully synchronous writes are not necessary; just limiting the write window size and requiring posted transactions to complete before queueing up more is. Doing that is more error-prone, however, and would I think complicate things just to optimize the speed of a rare case. Still, there are probably a few who would object, in which case they should do the work of optimizing that side case ;) There's still missing an actual mount_nfs(8) configuration flag and documentation, but those things are trivial. (Forwarded on to -current as well, for additional eyes/testers.) -- Brian Fundakowski Feldman \'[ FreeBSD ]''\ [EMAIL PROTECTED] \ The Power to Serve! \ Opinions expressed are my own. \,,\ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS client/buffer cache deadlock
On Tue, Apr 26, 2005 at 03:36:02PM -0400, Brian Fundakowski Feldman wrote: I'm still guessing that for whatever reason your writes on the FreeBSD 4.x NFS client are not using NFSv3/transactions. The second method I just now implemented; it works fine except for being slower since all data is acknowledged synchronously. Are you using one writev() instead of many writes so you can atomically write a large sparse data structure? If so, you will probably just have to cope with the lower performance than for reasonably-sized writes. If not: why are you trying to write it atomically? Just use multiple normal-sized write() calls. Yes, a single writev(). Just like in the kern/79207 PR. It doesn't have to be superfast (why would I use NFS otherwise), just as long as it's threadsafe / atomic. Btw. running the writev program with 20 * 100 MB on UFS on a 512MB FreeBSD 6-CURRENT system practicly locks the filesystem down _and_ causes all processes to be swapped out in favor of the buffer cache. 'top' however, doesnt' show a rise in BUF usage. On FreeBSD 4.x, the system performance as usual during the writev to UFS. That's certainly not very optimal. I don't know anything about it, sorry. No problem. I just thought that they might be related (this is with a single writev() as well). Marc pgpPPY7FDOsXE.pgp Description: PGP signature
Re: NFS client/buffer cache deadlock
On Wed, Apr 27, 2005 at 10:17:46AM +0200, Marc Olzheim wrote: On Tue, Apr 26, 2005 at 03:36:02PM -0400, Brian Fundakowski Feldman wrote: I'm still guessing that for whatever reason your writes on the FreeBSD 4.x NFS client are not using NFSv3/transactions. The second method I just now implemented; it works fine except for being slower since all data is acknowledged synchronously. Are you using one writev() instead of many writes so you can atomically write a large sparse data structure? If so, you will probably just have to cope with the lower performance than for reasonably-sized writes. If not: why are you trying to write it atomically? Just use multiple normal-sized write() calls. Yes, a single writev(). Just like in the kern/79207 PR. It doesn't have to be superfast (why would I use NFS otherwise), just as long as it's threadsafe / atomic. Alright, this will do synchronous, instead of short, writes (also, of course, not deadlock the system) if you are trying to use an excessively large buffer size. http://green.homeunix.org/~green/nfs_client.deadlock.patch http://green.homeunix.org/~green/nfs_client.deadlock.HEAD.patch -- Brian Fundakowski Feldman \'[ FreeBSD ]''\ [EMAIL PROTECTED] \ The Power to Serve! \ Opinions expressed are my own. \,,\ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS client/buffer cache deadlock
On Wed, Apr 27, 2005 at 12:08:57PM -0400, Brian Fundakowski Feldman wrote: Yes, a single writev(). Just like in the kern/79207 PR. It doesn't have to be superfast (why would I use NFS otherwise), just as long as it's threadsafe / atomic. Alright, this will do synchronous, instead of short, writes (also, of course, not deadlock the system) if you are trying to use an excessively large buffer size. http://green.homeunix.org/~green/nfs_client.deadlock.patch http://green.homeunix.org/~green/nfs_client.deadlock.HEAD.patch Great! This seems to do the trick and isn't that slow (about 2.8 MB/sec over 100 MBit, writing 600 * 1MB, 4.x gets about 5.5 MB/sec on the same system); it's fast enough for me and more importantly, it doesn't lock the system down anymore. ;-) Thanks a lot! Marc pgpduYilPphWu.pgp Description: PGP signature
Re: NFS client/buffer cache deadlock
On Wed, Apr 27, 2005 at 07:15:23PM +0200, Marc Olzheim wrote: On Wed, Apr 27, 2005 at 12:08:57PM -0400, Brian Fundakowski Feldman wrote: Yes, a single writev(). Just like in the kern/79207 PR. It doesn't have to be superfast (why would I use NFS otherwise), just as long as it's threadsafe / atomic. Alright, this will do synchronous, instead of short, writes (also, of course, not deadlock the system) if you are trying to use an excessively large buffer size. http://green.homeunix.org/~green/nfs_client.deadlock.patch http://green.homeunix.org/~green/nfs_client.deadlock.HEAD.patch Great! This seems to do the trick and isn't that slow (about 2.8 MB/sec over 100 MBit, writing 600 * 1MB, 4.x gets about 5.5 MB/sec on the same system); it's fast enough for me and more importantly, it doesn't lock the system down anymore. ;-) Thanks a lot! Alright, thanks for helping with this :-) Do you think you can find a way to tell if in 4.x you're actually using NFSv3/transactions? I would really like to know why 4.x isn't deadlocking, and that's the most plausible explanation I can think of right now. The behavior could also be totally different due to nfsiod, lack thereof, or its settings. Are you running nfsiod on 4.x, and if so, how many? On 5.x+ the default maximum of nfsiods is 20. -- Brian Fundakowski Feldman \'[ FreeBSD ]''\ [EMAIL PROTECTED] \ The Power to Serve! \ Opinions expressed are my own. \,,\ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS client/buffer cache deadlock
On Wed, Apr 27, 2005 at 02:34:35PM -0400, Brian Fundakowski Feldman wrote: Alright, thanks for helping with this :-) Do you think you can find a way to tell if in 4.x you're actually using NFSv3/transactions? I would really like to know why 4.x isn't deadlocking, and that's the most plausible explanation I can think of right now. The behavior could also be totally different due to nfsiod, lack thereof, or its settings. Are you running nfsiod on 4.x, and if so, how many? On 5.x+ the default maximum of nfsiods is 20. The install I ran the speedtest on is: FreeBSD baroque.ilse.net 4.10-STABLE FreeBSD 4.10-STABLE #23: Wed Aug 4 15:18:52 CEST 2004 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/BAROQUE i386 and runs 6 nfsiods. How can I tell whether it uses transactions ? The NFS server is mounted with: mount_nfs -c -P -a 0 -r 8192 -w 8192 -i -o rw,noatime,nointr host:/path /nfs/host/path Marc pgpuWvUlYAcUc.pgp Description: PGP signature
Re: NFS client/buffer cache deadlock
On Wed, Apr 27, 2005 at 08:42:03PM +0200, Marc Olzheim wrote: On Wed, Apr 27, 2005 at 02:34:35PM -0400, Brian Fundakowski Feldman wrote: Alright, thanks for helping with this :-) Do you think you can find a way to tell if in 4.x you're actually using NFSv3/transactions? I would really like to know why 4.x isn't deadlocking, and that's the most plausible explanation I can think of right now. The behavior could also be totally different due to nfsiod, lack thereof, or its settings. Are you running nfsiod on 4.x, and if so, how many? On 5.x+ the default maximum of nfsiods is 20. The install I ran the speedtest on is: FreeBSD baroque.ilse.net 4.10-STABLE FreeBSD 4.10-STABLE #23: Wed Aug 4 15:18:52 CEST 2004 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/BAROQUE i386 and runs 6 nfsiods. How can I tell whether it uses transactions ? I am not sure -- it should with NFSv3 though. Does mount -v tell you anything more detailed? I suppose that nfsstat may also be used -- the commit count should only increase when it does an NFSv3 commit operation. The NFS server is mounted with: mount_nfs -c -P -a 0 -r 8192 -w 8192 -i -o rw,noatime,nointr host:/path /nfs/host/path So the only really visible difference is 6 nfsiod versus 20 nfsiod. I think if you increased the nfsiod on that 4.x box to something much higher it would deadlock the same, then, assuming that all else is equal. -- Brian Fundakowski Feldman \'[ FreeBSD ]''\ [EMAIL PROTECTED] \ The Power to Serve! \ Opinions expressed are my own. \,,\ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS client/buffer cache deadlock
On Wed, Apr 27, 2005 at 03:06:27PM -0400, Brian Fundakowski Feldman wrote: How can I tell whether it uses transactions ? I am not sure -- it should with NFSv3 though. Does mount -v tell you anything more detailed? I suppose that nfsstat may also be used -- the commit count should only increase when it does an NFSv3 commit operation. The NFS server is mounted with: mount_nfs -c -P -a 0 -r 8192 -w 8192 -i -o rw,noatime,nointr host:/path /nfs/host/path So the only really visible difference is 6 nfsiod versus 20 nfsiod. I think if you increased the nfsiod on that 4.x box to something much higher it would deadlock the same, then, assuming that all else is equal. Well, I took on that challenge and although it seems that the machine completely hangs itself up for a while, it snaps out of it after it is done. tehethereal tells me it is NFSv3, and the commits increase: baroque:/se3/spiderbaknfsstat Client Info: Rpc Counts: Getattr SetattrLookup Readlink Read WriteCreateRemove 15272 20119 3401842 82797 548076917594394 43551 42697 Rename Link Symlink Mkdir Rmdir Readdir RdirPlusAccess 1307727 11573 2957 2959 17178 0 24938388 MknodFsstatFsinfo PathConfCommitGLeaseVacate Evict 0 6178814 0191593 0 0 0 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0210711330710 577517143 Cache Info: Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW HitsMisses 376190541 24213879 94021997 3401840 325304623 545060579699854594394 BioRLHitsMisses BioD HitsMisses DirE HitsMisses 3051125 82797 88340 17145 42013 0 Server Info: Getattr SetattrLookup Readlink Read WriteCreateRemove 0 0 0 0 0 0 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlusAccess 0 0 0 0 0 0 0 0 MknodFsstatFsinfo PathConfCommitGLeaseVacate Evict 0 0 0 0 0 0 0 0 Server Ret-Failed 0 Server Faults 0 Server Cache Stats: Inprog Idem Non-idemMisses 0 0 0 0 Server Lease Stats: Leases PeakL GLeases 0 0 0 Server Write Gathering: WriteOps WriteRPC Opsaved 0 0 0 baroque:/se3/spiderbak/home/marcolz/src/writev 100 foo0 nfsstat baroque:/se3/spiderbaknfsstat Client Info: Rpc Counts: Getattr SetattrLookup Readlink Read WriteCreateRemove 15272 20119 3401879 82797 548082677607194 43552 42697 Rename Link Symlink Mkdir Rmdir Readdir RdirPlusAccess 1307727 11573 2957 2959 17178 0 24938698 MknodFsstatFsinfo PathConfCommitGLeaseVacate Evict 0 6178814 0193252 0 0 0 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0210711331410 577537710 Cache Info: Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW HitsMisses 376195207 24214181 94022936 3401877 325306477 545066339699854607194 BioRLHitsMisses BioD HitsMisses DirE HitsMisses 3051125 82797 88340 17145 42013 0 Server Info: Getattr SetattrLookup Readlink Read WriteCreateRemove 0 0 0 0 0 0 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlusAccess 0 0 0 0 0 0 0 0 MknodFsstatFsinfo PathConfCommitGLeaseVacate Evict 0 0 0 0 0 0 0 0 Server Ret-Failed 0 Server Faults 0 Server Cache Stats: Inprog Idem Non-idemMisses 0 0 0 0 Server Lease Stats: Leases PeakL GLeases 0 0 0 Server Write Gathering: WriteOps WriteRPC Opsaved 0 0 0 baroque:/se3/spiderbak Marc pgpcSN102kK5N.pgp Description: PGP signature
[5.3Rp6] UFS buffer cache deadlock (was: NFS client/buffer cache deadlock)
On Wed, Apr 27, 2005 at 10:17:46AM +0200, Marc Olzheim wrote: Btw. running the writev program with 20 * 100 MB on UFS on a 512MB FreeBSD 6-CURRENT system practicly locks the filesystem down _and_ causes all processes to be swapped out in favor of the buffer cache. 'top' however, doesnt' show a rise in BUF usage. On FreeBSD 4.x, the system performance as usual during the writev to UFS. That's certainly not very optimal. I don't know anything about it, sorry. No problem. I just thought that they might be related (this is with a single writev() as well). Hmm, doing the 20 * 100MB on a 5.3-RELEASE-p6 system with 256 MB memory and 512MB swap (on UFS) hangs the machine. Luckily, I cannot reproduce it on 5.4-STABLE :-/ Marc pgpGAaQdYl2fa.pgp Description: PGP signature
Re: NFS client/buffer cache deadlock
On Wed, Apr 27, 2005 at 09:19:38PM +0200, Marc Olzheim wrote: On Wed, Apr 27, 2005 at 03:06:27PM -0400, Brian Fundakowski Feldman wrote: How can I tell whether it uses transactions ? I am not sure -- it should with NFSv3 though. Does mount -v tell you anything more detailed? I suppose that nfsstat may also be used -- the commit count should only increase when it does an NFSv3 commit operation. The NFS server is mounted with: mount_nfs -c -P -a 0 -r 8192 -w 8192 -i -o rw,noatime,nointr host:/path /nfs/host/path So the only really visible difference is 6 nfsiod versus 20 nfsiod. I think if you increased the nfsiod on that 4.x box to something much higher it would deadlock the same, then, assuming that all else is equal. Well, I took on that challenge and although it seems that the machine completely hangs itself up for a while, it snaps out of it after it is done. tehethereal tells me it is NFSv3, and the commits increase: [] Alright, thanks for testing. So the same bug exists in 4.x but may not actually deadlock the buffer cache in the same circumstances. If you feel like playing with it anymore, I would expect a certain write block size, maximum nfsiod, and total file size being written wouldn't be too hard to find that would deadlock 4.x, too. -- Brian Fundakowski Feldman \'[ FreeBSD ]''\ [EMAIL PROTECTED] \ The Power to Serve! \ Opinions expressed are my own. \,,\ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS client/buffer cache deadlock
On Wed, Apr 27, 2005 at 03:41:44PM -0400, Brian Fundakowski Feldman wrote: Alright, thanks for testing. So the same bug exists in 4.x but may not actually deadlock the buffer cache in the same circumstances. If you feel like playing with it anymore, I would expect a certain write block size, maximum nfsiod, and total file size being written wouldn't be too hard to find that would deadlock 4.x, too. 20 is the maximum number it is willing to start. 21 and above it tells me: baroque:~# nfsiod -n 21 nfsiod: nfsiod count 21; reset to 1 baroque:~# :-P Btw.: CPU time was nicely divided amoungst all of the nfsiod's when doing the writev: root87373 0.0 0.0 212 32 ?? I 9:11PM 0:00.18 nfsiod -n 20 root87372 0.0 0.0 212 32 ?? I 9:11PM 0:00.16 nfsiod -n 20 root87371 0.0 0.0 212 32 ?? I 9:11PM 0:00.16 nfsiod -n 20 root87370 0.0 0.0 212 32 ?? I 9:11PM 0:00.18 nfsiod -n 20 root87369 0.0 0.0 212 32 ?? I 9:11PM 0:00.18 nfsiod -n 20 root87368 0.0 0.0 212 32 ?? I 9:11PM 0:00.19 nfsiod -n 20 root87367 0.0 0.0 212 32 ?? I 9:11PM 0:00.18 nfsiod -n 20 root87366 0.0 0.0 212 32 ?? I 9:11PM 0:00.18 nfsiod -n 20 root87365 0.0 0.0 212 32 ?? I 9:11PM 0:00.17 nfsiod -n 20 root87364 0.0 0.0 212 32 ?? I 9:11PM 0:00.17 nfsiod -n 20 root87363 0.0 0.0 212 32 ?? I 9:11PM 0:00.19 nfsiod -n 20 root87362 0.0 0.0 212 32 ?? I 9:11PM 0:00.19 nfsiod -n 20 root87361 0.0 0.0 212 32 ?? I 9:11PM 0:00.20 nfsiod -n 20 root87360 0.0 0.0 212 32 ?? I 9:11PM 0:00.20 nfsiod -n 20 root87359 0.0 0.0 212 32 ?? I 9:11PM 0:00.20 nfsiod -n 20 root87358 0.0 0.0 212 32 ?? I 9:11PM 0:00.19 nfsiod -n 20 root87357 0.0 0.0 212 32 ?? I 9:11PM 0:00.19 nfsiod -n 20 root87356 0.0 0.0 212 32 ?? I 9:11PM 0:00.21 nfsiod -n 20 root87355 0.0 0.0 212 32 ?? I 9:11PM 0:00.21 nfsiod -n 20 root87354 0.0 0.0 212 32 ?? I 9:11PM 0:00.23 nfsiod -n 20 Marc pgpn3e0q3lZHO.pgp Description: PGP signature
Re: NFS client/buffer cache deadlock
[changed cc: from standards@ back to stable@ again.] On Tue, Apr 26, 2005 at 12:25:49PM -0400, Brian Fundakowski Feldman wrote: You can assure that this happens in only two ways: 1. Make a complete copy of the data. This is what currently occurs: it gets stuffed into the buffer cache as the write happens. 2. Keep the data around synchronously -- by virtue of the write system call being used synchronously, the thread's VM context is around, and duplication need not occur. It seems as though FreeBSD 4.x either used 2) or does something wrong indeed. Why would 2) be a problem on FreeBSD 5.x ? Can't the pages written from be locked during the write, instead of copied internally ? Btw. running the writev program with 20 * 100 MB on UFS on a 512MB FreeBSD 6-CURRENT system practicly locks the filesystem down _and_ causes all processes to be swapped out in favor of the buffer cache. 'top' however, doesnt' show a rise in BUF usage. On FreeBSD 4.x, the system performance as usual during the writev to UFS. Marc pgple5KkUSnn9.pgp Description: PGP signature
Re: NFS client/buffer cache deadlock
On Tue, Apr 26, 2005 at 06:43:46PM +0200, Marc Olzheim wrote: [changed cc: from standards@ back to stable@ again.] On Tue, Apr 26, 2005 at 12:25:49PM -0400, Brian Fundakowski Feldman wrote: You can assure that this happens in only two ways: 1. Make a complete copy of the data. This is what currently occurs: it gets stuffed into the buffer cache as the write happens. 2. Keep the data around synchronously -- by virtue of the write system call being used synchronously, the thread's VM context is around, and duplication need not occur. It seems as though FreeBSD 4.x either used 2) or does something wrong indeed. Why would 2) be a problem on FreeBSD 5.x ? Can't the pages written from be locked during the write, instead of copied internally ? I'm still guessing that for whatever reason your writes on the FreeBSD 4.x NFS client are not using NFSv3/transactions. The second method I just now implemented; it works fine except for being slower since all data is acknowledged synchronously. Are you using one writev() instead of many writes so you can atomically write a large sparse data structure? If so, you will probably just have to cope with the lower performance than for reasonably-sized writes. If not: why are you trying to write it atomically? Just use multiple normal-sized write() calls. Btw. running the writev program with 20 * 100 MB on UFS on a 512MB FreeBSD 6-CURRENT system practicly locks the filesystem down _and_ causes all processes to be swapped out in favor of the buffer cache. 'top' however, doesnt' show a rise in BUF usage. On FreeBSD 4.x, the system performance as usual during the writev to UFS. That's certainly not very optimal. I don't know anything about it, sorry. -- Brian Fundakowski Feldman \'[ FreeBSD ]''\ [EMAIL PROTECTED] \ The Power to Serve! \ Opinions expressed are my own. \,,\ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]