Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
>> It's worth reminding that -o tcp is an option. > Not for NFS through a (stateful) filtering router, no. True. But then, not over network hops that drop port 2049, either. Break the assumptions underlying the 'net and you have to expect breakage from stuff built atop it. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
At 19:14 Uhr + 31.07.2019, m...@netbsd.org wrote: >It's worth reminding that -o tcp is an option. Not for NFS through a (stateful) filtering router, no. Reboot the router, and you will have to walk up to every client and reboot it. With NFS over UDP, the clients will recover. Cheerio, hauke -- "It's never straight up and down" (DEVO)
Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
At 10:45 Uhr +0200 31.07.2019, Edgar Fuß wrote: >Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) HFS >mount combined with a mis-configured IPFilter that blocked all but the >first fragment of a fragmented NFS reply (e.g., readdir) combined with a >NetBSD design error (or so Taylor says) that a vnode lock may be held >accross I/O, in this case, network I/O. I ran into a similar issue 2004ish, connecting RedHat Linux clients to a (NetBSD) nfs (udp) server through a (NetBSD, ipfilter) filtering router. Darren back then told me Linux sends fragmented packets tail-first, which ipfilter was not prepared to deal with. I switched to pf, which was able deal with the scenario just fine, and didn't look back. Cheerio, hauke -- "It's never straight up and down" (DEVO)
Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
On Wed, Jul 31, 2019 at 07:11:54AM -0700, Jason Thorpe wrote: > > > On Jul 31, 2019, at 1:45 AM, Edgar Fuß wrote: > > > > NetBSD design error (or so Taylor says) that a vnode lock may be held > > accross I/O > > 100% > > NetBSD's VFS locking protocol needs a serious overhaul. At least one other > BSD-family VFS (the one in XNU) completely eliminated locking of vnodes at > the VFS layer (it's all pushed into the file system back-ends who now have > more control over their own locking requirements). It does have some > additional complexities around reference / busy counting and vnode identity, > but it works very well in practice. > > I don't know what FreeBSD has done in this area. > > -- thorpej > IMNT_MPSAFE, which NFS isn't?
Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
On Wed, Jul 31, 2019 at 11:42:26AM -0500, Don Lee wrote: > If you go back a few years, you can find a thread where I reported tstile > lockups on PPC. I don’t remember the details, but it was back in 6.1 as I > recall. This is not a new problem, and not limited to NFS. I still have a > similar problem with my 7.2 system, usually triggered when I do backups > (dump/restore). The dump operation locks up and cannot be killed. The system > continues, except any process that trips over the tstile also locks up. > Eventually, the system grinds to a complete halt. (can’t even log in) If I > catch it before that point, I can almost reboot, but I have to power cycle to > kill the tstile process(es), or the reboot also hangs. It's worth reminding that -o tcp is an option.
Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
If you go back a few years, you can find a thread where I reported tstile lockups on PPC. I don’t remember the details, but it was back in 6.1 as I recall. This is not a new problem, and not limited to NFS. I still have a similar problem with my 7.2 system, usually triggered when I do backups (dump/restore). The dump operation locks up and cannot be killed. The system continues, except any process that trips over the tstile also locks up. Eventually, the system grinds to a complete halt. (can’t even log in) If I catch it before that point, I can almost reboot, but I have to power cycle to kill the tstile process(es), or the reboot also hangs. -dgl- > On Jul 31, 2019, at 9:11 AM, Jason Thorpe wrote: > > >> On Jul 31, 2019, at 1:45 AM, Edgar Fuß wrote: >> >> NetBSD design error (or so Taylor says) that a vnode lock may be held >> accross I/O > > 100% > > NetBSD's VFS locking protocol needs a serious overhaul. At least one other > BSD-family VFS (the one in XNU) completely eliminated locking of vnodes at > the VFS layer (it's all pushed into the file system back-ends who now have > more control over their own locking requirements). It does have some > additional complexities around reference / busy counting and vnode identity, > but it works very well in practice. > > I don't know what FreeBSD has done in this area. > > -- thorpej >
Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
> On Jul 31, 2019, at 1:45 AM, Edgar Fuß wrote: > > NetBSD design error (or so Taylor says) that a vnode lock may be held accross > I/O 100% NetBSD's VFS locking protocol needs a serious overhaul. At least one other BSD-family VFS (the one in XNU) completely eliminated locking of vnodes at the VFS layer (it's all pushed into the file system back-ends who now have more control over their own locking requirements). It does have some additional complexities around reference / busy counting and vnode identity, but it works very well in practice. I don't know what FreeBSD has done in this area. -- thorpej
NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) HFS mount combined with a mis-configured IPFilter that blocked all but the first fragment of a fragmented NFS reply (e.g., readdir) combined with a NetBSD design error (or so Taylor says) that a vnode lock may be held accross I/O, in this case, network I/O. It should be reproducable with a default NFS mount and a block in all with frag-body IPFilter rule and then trying to readdir. Now, in some cases, the machine in question recovered after fixing the filter rules, in others, it didn't, forcing a reboot. This strikes me as a bug because the same lock-up could as well have been caused by network problems instead of ipfilter mis-configuration. It looks like the operation to which the reply was lost sometimes doesn't get retried. Do we have some weird bug where the first fragment arriving stops the timeout but the blocking of the remaining fragments cause it to wedge?
Re: 8.1 tstile lockup after nfs send error 51
Here are stack traces of all the frozen processes (with a few newlines inserted manually): Crash version 8.1_STABLE, image version 8.1_STABLE. Output from a running system is unreliable. crash> trace/t 0t16306 trace: pid 16306 lid 1 at 0x8001578daa20 sleepq_block() at sleepq_block+0x97 turnstile_block() at turnstile_block+0x24f rw_vector_enter() at rw_vector_enter+0x120 genfs_lock() at genfs_lock+0x47 VOP_LOCK() at VOP_LOCK+0x80 vn_lock() at vn_lock+0x11 nfs_root() at nfs_root+0x2b VFS_ROOT() at VFS_ROOT+0x35 lookup_once() at lookup_once+0x250 namei_tryemulroot.constprop.9() at namei_tryemulroot.constprop.9+0x3b1 namei() at namei+0x43 fd_nameiat.isra.2() at fd_nameiat.isra.2+0x54 do_sys_statat() at do_sys_statat+0x77 sys___stat50() at sys___stat50+0x28 syscall() at syscall+0x1ec --- syscall (number 439) --- 787fc74f98ca: crash> trace/t 0t16685 trace: pid 16685 lid 1 at 0x80015791aae0 sleepq_block() at sleepq_block+0x97 turnstile_block() at turnstile_block+0x24f rw_vector_enter() at rw_vector_enter+0x120 genfs_lock() at genfs_lock+0x47 VOP_LOCK() at VOP_LOCK+0x80 vn_lock() at vn_lock+0x11 nfs_root() at nfs_root+0x2b VFS_ROOT() at VFS_ROOT+0x35 lookup_once() at lookup_once+0x250 namei_tryemulroot.constprop.9() at namei_tryemulroot.constprop.9+0x3b1 namei() at namei+0x43 chdir_lookup() at chdir_lookup+0x70 sys_chdir() at sys_chdir+0x26 syscall() at syscall+0x1ec --- syscall (number 12) --- 758c3b8820ca: crash> trace/t 0t19979 trace: pid 19979 lid 1 at 0x80013ed7ea20 sleepq_block() at sleepq_block+0x97 turnstile_block() at turnstile_block+0x24f rw_vector_enter() at rw_vector_enter+0x120 genfs_lock() at genfs_lock+0x47 VOP_LOCK() at VOP_LOCK+0x80 vn_lock() at vn_lock+0x11 nfs_root() at nfs_root+0x2b VFS_ROOT() at VFS_ROOT+0x35 lookup_once() at lookup_once+0x250 namei_tryemulroot.constprop.9() at namei_tryemulroot.constprop.9+0x3b1 namei() at namei+0x43 fd_nameiat.isra.2() at fd_nameiat.isra.2+0x54 do_sys_statat() at do_sys_statat+0x77 sys___lstat50() at sys___lstat50+0x25 syscall() at syscall+0x1ec --- syscall (number 441) --- 7c0a9c4a1a7a: crash> trace/t 0t25025 trace: pid 25025 lid 1 at 0x8001576628a0 sleepq_block() at sleepq_block+0x97 turnstile_block() at turnstile_block+0x24f rw_vector_enter() at rw_vector_enter+0x120 genfs_lock() at genfs_lock+0x47 VOP_LOCK() at VOP_LOCK+0x80 vn_lock() at vn_lock+0x11 nfs_lookup() at nfs_lookup+0xf84 VOP_LOOKUP() at VOP_LOOKUP+0x85 lookup_once() at lookup_once+0x18d namei_tryemulroot.constprop.9() at namei_tryemulroot.constprop.9+0x3b1 namei() at namei+0x43 fd_nameiat.isra.2() at fd_nameiat.isra.2+0x54 do_sys_statat() at do_sys_statat+0x77 sys___stat50() at sys___stat50+0x28 syscall() at syscall+0x1ec --- syscall (number 439) --- 6fd53d4f98ca: crash> trace/t 0t28321 trace: pid 28321 lid 1 at 0x80013f8768a0 sleepq_block() at sleepq_block+0x97 turnstile_block() at turnstile_block+0x24f rw_vector_enter() at rw_vector_enter+0x120 genfs_lock() at genfs_lock+0x47 VOP_LOCK() at VOP_LOCK+0x80 vn_lock() at vn_lock+0x11 nfs_lookup() at nfs_lookup+0xf84 VOP_LOOKUP() at VOP_LOOKUP+0x85 lookup_once() at lookup_once+0x18d namei_tryemulroot.constprop.9() at namei_tryemulroot.constprop.9+0x3b1 namei() at namei+0x43 fd_nameiat.isra.2() at fd_nameiat.isra.2+0x54 do_sys_statat() at do_sys_statat+0x77 sys___stat50() at sys___stat50+0x28 syscall() at syscall+0x1ec --- syscall (number 439) --- 7656ab6f98ca: crash> trace/t 0t18618 trace: pid 18618 lid 1 at 0x80013f9b2a20 sleepq_block() at sleepq_block+0x97 turnstile_block() at turnstile_block+0x24f rw_vector_enter() at rw_vector_enter+0x120 genfs_lock() at genfs_lock+0x47 VOP_LOCK() at VOP_LOCK+0x80 vn_lock() at vn_lock+0x11 nfs_root() at nfs_root+0x2b VFS_ROOT() at VFS_ROOT+0x35 lookup_once() at lookup_once+0x250 namei_tryemulroot.constprop.9() at namei_tryemulroot.constprop.9+0x3b1 namei() at namei+0x43 fd_nameiat.isra.2() at fd_nameiat.isra.2+0x54 do_sys_statat() at do_sys_statat+0x77 sys___lstat50() at sys___lstat50+0x25 syscall() at syscall+0x1ec --- syscall (number 441) --- 7cbc7f0a1a7a: crash> trace/t 0t22881 trace: pid 22881 lid 1 at 0x80013eef6a20 sleepq_block() at sleepq_block+0x97 turnstile_block() at turnstile_block+0x24f rw_vector_enter() at rw_vector_enter+0x120 genfs_lock() at genfs_lock+0x47 VOP_LOCK() at VOP_LOCK+0x80 vn_lock() at vn_lock+0x11 nfs_root() at nfs_root+0x2b VFS_ROOT() at VFS_ROOT+0x35 lookup_once() at lookup_once+0x250 namei_tryemulroot.constprop.9() at namei_tryemulroot.constprop.9+0x3b1 namei() at namei+0x43 fd_nameiat.isra.2() at fd_nameiat.isra.2+0x54 do_sys_statat() at do_sys_statat+0x77 sys___stat50() at sys___stat50+0x28 syscall() at syscall+0x1ec --- syscall (number 439) --- 792b4e4f98ca:
8.1 tstile lockup after nfs send error 51
I experiened an "nfs send error 51" on an NFS-imported file system, and after that, any process accessing that FS seems to be frozen in tstile. Any way out short or re-booting? Anything to analyze before?
Re: tstile lockup
On 11/23/2012 05:06 PM, Edgar Fuß wrote: >> Try running `svn ...' as `lockstat -T rwlock svn ...'. By chance we get more >> information on lock congestion. > Ouch! I overlooked this post of yours until a colleague asked me about it. > > Elapsed time: 18.33 seconds. > > -- RW lock sleep (reader) > > Total% Count Time/ms Lock Caller > -- --- - -- -- > 100.00 1 11696.42 fe8419e266f8 wapbl_begin+54 > > Does that help? > Hi, there is one rw_lock that is taken in read mode in wapbl_begin and that lock is hold in writer mode across log flushing. So while flushing a transaction further scheduling of transaction data is on hold, that might be the cause. Lars -- Mystische Erklärungen: Die mystischen Erklärungen gelten für tief; die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind. -- Friedrich Nietzsche [ Die Fröhliche Wissenschaft Buch 3, 126 ]
Re: tstile lockup
Can you use addr2line with that wapbl address to find out whal line is it. On Nov 23, 2012 5:06 PM, "Edgar Fuß" wrote: > > Try running `svn ...' as `lockstat -T rwlock svn ...'. By chance we get > more > > information on lock congestion. > Ouch! I overlooked this post of yours until a colleague asked me about it
Re: tstile lockup
> Try running `svn ...' as `lockstat -T rwlock svn ...'. By chance we get more > information on lock congestion. Ouch! I overlooked this post of yours until a colleague asked me about it. Elapsed time: 18.33 seconds. -- RW lock sleep (reader) Total% Count Time/ms Lock Caller -- --- - -- -- 100.00 1 11696.42 fe8419e266f8 wapbl_begin+54 Does that help?
Re: tstile lockup
On Nov 19, 2012, at 6:40 PM, Edgar Fuß wrote: >> Do you get a deadlock > No. > >> will the system come back to work after some time? > Yes. At least for appropriate values of "some time". > This may take minutes (at least in the dd case, I haven't seen this in the > svn case). Try running `svn ...' as `lockstat -T rwlock svn ...'. By chance we get more information on lock congestion. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: tstile lockup
> Do you get a deadlock No. > will the system come back to work after some time? Yes. At least for appropriate values of "some time". This may take minutes (at least in the dd case, I haven't seen this in the svn case).
Re: tstile lockup
On Nov 19, 2012, at 4:53 PM, Edgar Fuß wrote: > OK, this is the svn process (directly running on the file server, not > operating > via NFS) tstile-ing: > > crash> ps | grep "\(vnode\|tstile\)" > 250511 3 0 0 fe82ec17d200svn tstile > crash> t/a fe82ec17d200 > trace: pid 25051 lid 1 at 0xfe811e901700 > sleepq_block() at sleepq_block+0xad > turnstile_block() at turnstile_block+0x2bb > rw_vector_enter() at rw_vector_enter+0x1eb > wapbl_begin() at wapbl_begin+0x54 > rw_vector_enter() at rw_vector_enter+0x1eb > genfs_lock() at genfs_lock+0x7b > VOP_LOCK() at VOP_LOCK+0x32 > vn_lock() at vn_lock+0x7d > vget() at vget+0xa2 > ufs_ihashget() at ufs_ihashget+0x84 > ffs_vget() at ffs_vget+0xc1 > ufs_fhtovp() at ufs_fhtovp+0x1f > ffs_fhtovp() at ffs_fhtovp+0x55 > VFS_FHTOVP() at VFS_FHTOVP+0x1c > nfsrv_fhtovp() at nfsrv_fhtovp+0x9a > nfsrv_getattr() at nfsrv_getattr+0x13a > nfssvc_nfsd() at nfssvc_nfsd+0x1d7 > sys_nfssvc() at sys_nfssvc+0x22c > syscall() at syscall+0xc4 > crash> ps | grep "\(vnode\|tstile\)" > 442 4 3 5 0 fe841628c060 slave tstile > > These are somewhat difficult to catch in the act since you have to copy-paste > the address from the ps output into the t command. > > Now, does that help to diagnose the problem? Ok, so you see threads waiting for the wapbl rwlock and some vnode rwlock. This is ok as long as it doesn't deadlock. Do you get a deadlock or will the system come back to work after some time? -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: tstile lockup
OK, this is the svn process (directly running on the file server, not operating via NFS) tstile-ing: crash> ps | grep "\(vnode\|tstile\)" 250511 3 0 0 fe82ec17d200svn tstile crash> t/a fe82ec17d200 trace: pid 25051 lid 1 at 0xfe811e901700 sleepq_block() at sleepq_block+0xad turnstile_block() at turnstile_block+0x2bb rw_vector_enter() at rw_vector_enter+0x1eb wapbl_begin() at wapbl_begin+0x54 ffs_valloc() at ffs_valloc+0x5e ufs_makeinode() at ufs_makeinode+0x62 ufs_create() at ufs_create+0x5c VOP_CREATE() at VOP_CREATE+0x38 vn_open() at vn_open+0x2ed do_open() at do_open+0xcd sys_open() at sys_open+0x59 syscall() at syscall+0xc4 crash> ps | grep "\(vnode\|tstile\)" 250511 3 0 0 fe82ec17d200svn tstile And this is both the local svn and one nfsd slave tstile-ing: crash> ps | grep "\(vnode\|tstile\)" 105321 3 0 0 fe82eca5d5c0svn tstile 442 4 3 5 0 fe841628c060 slave tstile crash> t/a fe841628c060 trace: pid 442 lid 4 at 0xfe811dbfe4e0 sleepq_block() at sleepq_block+0xad turnstile_block() at turnstile_block+0x2bb rw_vector_enter() at rw_vector_enter+0x1eb genfs_lock() at genfs_lock+0x7b VOP_LOCK() at VOP_LOCK+0x32 vn_lock() at vn_lock+0x7d vget() at vget+0xa2 ufs_ihashget() at ufs_ihashget+0x84 ffs_vget() at ffs_vget+0xc1 ufs_fhtovp() at ufs_fhtovp+0x1f ffs_fhtovp() at ffs_fhtovp+0x55 VFS_FHTOVP() at VFS_FHTOVP+0x1c nfsrv_fhtovp() at nfsrv_fhtovp+0x9a nfsrv_getattr() at nfsrv_getattr+0x13a nfssvc_nfsd() at nfssvc_nfsd+0x1d7 sys_nfssvc() at sys_nfssvc+0x22c syscall() at syscall+0xc4 crash> ps | grep "\(vnode\|tstile\)" 442 4 3 5 0 fe841628c060 slave tstile These are somewhat difficult to catch in the act since you have to copy-paste the address from the ps output into the t command. Now, does that help to diagnose the problem?
Re: tstile lockup
On Mon, Nov 19, 2012 at 12:59:13PM +0100, Edgar Fuß wrote: > Should I try to get crash(8) outputs of the "real world" situation? I guess that would be good - even if only to verify this is related or not. Martin
Re: tstile lockup
> Why do you think both lockups are related? Because the "real world" problem also involves large amounts of metadata being written and also results in nfsd's stuck in tstile. Should I try to get crash(8) outputs of the "real world" situation?
Re: tstile lockup
On Mon, Nov 19, 2012 at 12:31:47PM +0100, Edgar Fuß wrote: > The problem is that this "lock-up", artificial as the dd to the block device > may seem, appears to happen "real-world" during an svn update command: the > other nfsd threads get stuck to the point where other clients get "nfs server > not responding" messages. Why do you think both lockups are related? Martin
Re: tstile lockup
On Wed, Oct 31, 2012 at 05:42:12PM +0100, Edgar Fuß wrote: > > Invoke crash(8), then just perform "ps" and "t/a address" on each LWP > > which seems to be stuck (on "tstile" or elsewhere). > So it seems I can sort of lock up the machine for minutes with a simple > dd if=/dev/zero of=/dev/dk14 bs=64k count=1000 > (In case it matters, dk14 is on a RAID5 on 4+1 mpt(4) SAS discs) > I get an impressive dd throughput of around 350kB/s > > Here's the crash(8) output during one of the partial lock-ups: [...] > Does that help? Apparently not. What can I do to debug this? The problem is that this "lock-up", artificial as the dd to the block device may seem, appears to happen "real-world" during an svn update command: the other nfsd threads get stuck to the point where other clients get "nfs server not responding" messages.
Re: RAIDframe level 5 write performance(was: tstile lockup)
On Fri, Nov 02, 2012 at 06:02:01PM +0100, Edgar Fu? wrote: > > Writing to that RAID's block device (raid2d) in 64k blocks gives me a dazzling > troughput of 2.4MB/s and a dd mostly waiting in vnode. Writing to the block device from userspace is not a good idea. How is performance through the filesystem? The "block device" actually accesses the metadata cache. It is not intended for bulk writes and allocates memory in the kernel at every request. It's fine for what the filesystem needs it for, but not for what you're trying to use it for. Thor
RAIDframe level 5 write performance(was: tstile lockup)
There seems to be a fundamental problem with writing to a level 5 RAIDframe set, at least to the block device. I've created five small wedges in the spared-out region of my 3TB SAS discs. In case it matters, they are connected to an mpt(4) controller. Then I configured a 5-component, 32-SpSU, level 5 RAID set. Unless I've gone mad, 32 sectors per SU equal 16k per SU, i.e. 64k per stripe. Writing to that RAID's block device (raid2d) in 64k blocks gives me a dazzling troughput of 2.4MB/s and a dd mostly waiting in vnode. Writing to the raw device (rraid2d) gives 240MB/s, i.e. is two orders of magnitude faster. Reading is at 22MB/s for the block device and 315MB/s for the raw device. Writing 16k chunks to the block dev drops to 530kB/s. To make sure there's nothing wrong with my discs, I the configured a two- component level 1 array with 128 SpSU (again giving a 64k stripe size, I hope). With that, I write 57MB/s to the block and 85MB/s to the raw device. Reading is 21MB/s for the block dev and 87MB/s for the raw dev. So there's nothing wrong with my discs, I think; there's something wrong with level 5 RAIDframe. My impression is that NetBSD 4 didn't have that issue but I can't test because the discs on that machine are too large. How can I analyze/debug this? Performance is so insanely bad I cannot possibly put this file server into production.
tstile lockup (was: Serious WAPL performance problems)
> Invoke crash(8), then just perform "ps" and "t/a address" on each LWP > which seems to be stuck (on "tstile" or elsewhere). So it seems I can sort of lock up the machine for minutes with a simple dd if=/dev/zero of=/dev/dk14 bs=64k count=1000 (In case it matters, dk14 is on a RAID5 on 4+1 mpt(4) SAS discs) I get an impressive dd throughput of around 350kB/s Here's the crash(8) output during one of the partial lock-ups: crash> ps | fgrep vnode 259951 3 1 100 fe8414a24100 dd vnode crash> t/a fe8414a24100 trace: pid 25995 lid 1 at 0xfe811e7de8f0 sleepq_block() at sleepq_block+0xad cv_wait() at cv_wait+0x9f vflushbuf() at vflushbuf+0x10f ffs_spec_fsync() at ffs_spec_fsync+0x3e VOP_FSYNC() at VOP_FSYNC+0x3e vinvalbuf() at vinvalbuf+0xf2 spec_close() at spec_close+0x73 VOP_CLOSE() at VOP_CLOSE+0x33 vn_close() at vn_close+0x4e closef() at closef+0x4a fd_close() at fd_close+0x116 syscall() at syscall+0xc4 crash> ps | fgrep tstile 106295 3 5 0 fe838c989680 slave tstile 106294 3 3 0 fe838c989260 slave tstile 106293 3 4 0 fe8415e82060 slave tstile 106292 3 2 0 fe8267e74ac0 slave tstile crash> t/a fe838c989680 trace: pid 10629 lid 5 at 0xfe811e25e4f0 sleepq_block() at sleepq_block+0xad turnstile_block() at turnstile_block+0x2bb rw_vector_enter() at rw_vector_enter+0x1eb wapbl_begin() at wapbl_begin+0x54 ffs_write() at ffs_write+0x8b9 VOP_WRITE() at VOP_WRITE+0x37 nfsrv_write() at nfsrv_write+0x923 nfssvc_nfsd() at nfssvc_nfsd+0x1d7 sys_nfssvc() at sys_nfssvc+0x22c syscall() at syscall+0xc4 crash> t/a fe838c989260 trace: pid 10629 lid 4 at 0xfe811e27e4f0 sleepq_block() at sleepq_block+0xad turnstile_block() at turnstile_block+0x2bb rw_vector_enter() at rw_vector_enter+0x1eb wapbl_flush() at wapbl_flush+0x54 ffs_fsync() at ffs_fsync+0x112 VOP_FSYNC() at VOP_FSYNC+0x3e nfsrv_commit() at nfsrv_commit+0x69a nfssvc_nfsd() at nfssvc_nfsd+0x1d7 sys_nfssvc() at sys_nfssvc+0x22c syscall() at syscall+0xc4 crash> t/a fe8415e82060 trace: pid 10629 lid 3 at 0xfe811dbde3c0 sleepq_block() at sleepq_block+0xad turnstile_block() at turnstile_block+0x2bb rw_vector_enter() at rw_vector_enter+0x1eb genfs_lock() at genfs_lock+0x7b VOP_LOCK() at VOP_LOCK+0x32 vn_lock() at vn_lock+0x7d vget() at vget+0xa2 ufs_ihashget() at ufs_ihashget+0x84 ffs_vget() at ffs_vget+0xc1 ufs_fhtovp() at ufs_fhtovp+0x1f ffs_fhtovp() at ffs_fhtovp+0x55 VFS_FHTOVP() at VFS_FHTOVP+0x1c nfsrv_fhtovp() at nfsrv_fhtovp+0x9a nfsrv_write() at nfsrv_write+0x502 nfssvc_nfsd() at nfssvc_nfsd+0x1d7 sys_nfssvc() at sys_nfssvc+0x22c syscall() at syscall+0xc4 crash> t/a fe8267e74ac0 trace: pid 10629 lid 2 at 0xfe811e29e3c0 sleepq_block() at sleepq_block+0xad turnstile_block() at turnstile_block+0x2bb rw_vector_enter() at rw_vector_enter+0x1eb genfs_lock() at genfs_lock+0x7b VOP_LOCK() at VOP_LOCK+0x32 vn_lock() at vn_lock+0x7d vget() at vget+0xa2 ufs_ihashget() at ufs_ihashget+0x84 ffs_vget() at ffs_vget+0xc1 ufs_fhtovp() at ufs_fhtovp+0x1f ffs_fhtovp() at ffs_fhtovp+0x55 VFS_FHTOVP() at VFS_FHTOVP+0x1c nfsrv_fhtovp() at nfsrv_fhtovp+0x9a nfsrv_write() at nfsrv_write+0x502 nfssvc_nfsd() at nfssvc_nfsd+0x1d7 sys_nfssvc() at sys_nfssvc+0x22c syscall() at syscall+0xc4 crash> ps | fgrep tstile 106295 3 5 0 fe838c989680 slave tstile 106294 3 3 0 fe838c989260 slave tstile 106293 3 4 0 fe8415e82060 slave tstile 106292 3 2 0 fe8267e74ac0 slave tstile crash> ps | fgrep vnode 259951 3 1 100 fe8414a24100 dd vnode crash> t/a fe8414a24100 trace: pid 25995 lid 1 at 0xfe811e7de8f0 sleepq_block() at sleepq_block+0xad cv_wait() at cv_wait+0x9f vflushbuf() at vflushbuf+0x10f ffs_spec_fsync() at ffs_spec_fsync+0x3e VOP_FSYNC() at VOP_FSYNC+0x3e vinvalbuf() at vinvalbuf+0xf2 spec_close() at spec_close+0x73 VOP_CLOSE() at VOP_CLOSE+0x33 vn_close() at vn_close+0x4e closef() at closef+0x4a fd_close() at fd_close+0x116 syscall() at syscall+0xc4 crash> ps | fgrep bio 0 81 3 2 200 fe811dcc7b20ioflush biolock 0 43 1 5 200 fe811d3631a0 softbio/5 0 37 1 4 200 fe811d33c160 softbio/4 0 31 1 3 200 fe811d318120 softbio/3 0 25 1 2 200 fe811d2f10e0 softbio/2 0 19 1 1 200 fe811d2cd0a0 softbio/1 04 1 0 200 fe841df43000 softbio/0 crash> t/a fe811dcc7b20 trace: pid 0 lid 81 at 0xfe811d8de980 sleepq_block() at sleepq_block+0xad cv_timedwait() at cv_timedwait+0xa7 bbusy() at bbusy+0xe4 getblk() at getblk+0x60 bio_doread.clone.1() at bio_doread.clone.1+0x1d bread() at bread+0x1a ffs_update() at ffs_update+0x12c ffs_sync()