On Friday, January 28, 2011 8:10:41 pm John Hickey wrote: > There was a previous thread about this, but it doesn't look like there was > any resolution: > > http://lists.freebsd.org/pipermail/freebsd-stable/2010-May/056986.html > > I run a fileserver for an Emulab (www.emulab.net) system. As such, the > exports table is constantly modified as experiments are swapped in and out. We also get a lot of researchers using NFS for strange things. In this case, the exclusive lock was for a cache directory shared by about 36 machines running Ubuntu 8.04 and mounting with NFSv2. Eventually, all our nfsd processes get stuck since the exclusive lock for the directory is never released. I could use any and all pointers on getting this fixed. > > What I am running: > > jjh@users: ~$ uname -a > FreeBSD users.isi.deterlab.net 7.3-RELEASE-p2 FreeBSD 7.3-RELEASE-p2 #9: Tue > Sep 14 16:24:57 PDT 2010 r...@users.isi.deterlab.net:/usr/obj/usr/src/sys/USERS7 i386 > > Here are the sleepchains for my system (note that 0xd1f72678 appears twice): > > 0xce089cf0: tag syncer, type VNON > usecount 1, writecount 0, refcount 2 mountedhere 0 > flags () > lock type syncer: EXCL (count 1) by thread 0xcdb4b000 (pid 46) > > 0xd1f72678: tag ufs, type VDIR > usecount 2, writecount 0, refcount 67 mountedhere 0 > flags () > v_object 0xd1e90e80 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xce1146c0 (pid 866) with 62 pending > ino 143173560, on dev mfid0s1f
From the stack trace, this vnode is the directory vnode that is the parent of the new file being created. > (kgdb) bt > #0 sched_switch (td=0xce1146c0, newtd=Variable "newtd" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1936 > #1 0xc080a4a6 in mi_switch (flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/kern_synch.c:444 > #2 0xc0837aab in sleepq_switch (wchan=Variable "wchan" is not available. > ) at /usr/src/sys/kern/subr_sleepqueue.c:497 > #3 0xc08380f6 in sleepq_wait (wchan=0xd4176394) at > /usr/src/sys/kern/subr_sleepqueue.c:580 > #4 0xc080a92a in _sleep (ident=0xd4176394, lock=0xc0ceb498, priority=80, > wmesg=0xc0bb656e "ufs", timo=0) at /usr/src/sys/kern/kern_synch.c:230 > #5 0xc07ea9fa in acquire (lkpp=0xcd7375a0, extflags=Variable "extflags" is > not available. > ) at /usr/src/sys/kern/kern_lock.c:151 > #6 0xc07eb2ec in _lockmgr (lkp=0xd4176394, flags=8194, interlkp=0xd41763c4, > td=0xce1146c0, file=0xc0bc20c8 "/usr/src/sys/kern/vfs_subr.c", line=2062) > at /usr/src/sys/kern/kern_lock.c:384 > #7 0xc0a24765 in ffs_lock (ap=0xcd737608) at > /usr/src/sys/ufs/ffs/ffs_vnops.c:377 > #8 0xc0b26876 in VOP_LOCK1_APV (vop=0xc0ca4740, a=0xcd737608) at > vnode_if.c:1618 > #9 0xc0896d76 in _vn_lock (vp=0xd417633c, flags=8194, td=0xce1146c0, > file=0xc0bc20c8 "/usr/src/sys/kern/vfs_subr.c", line=2062) at vnode_if.h:851 Note that, this vnode (vp) doesn't show up in your list above. You can try using my gdb scripts at www.freebsd.org/~jhb/gdb (you want gdb6* and do 'source gdb6'). You can then do 'vprint vp' at this frame and should see lock details about who holds this lock. However, I would not expect the vnode lock for a new i-node to be already held. There's a chance though you are tripping over the bug fixed by these two changes: Author: jhb Date: Fri Jul 16 20:23:24 2010 New Revision: 210173 URL: http://svn.freebsd.org/changeset/base/210173 Log: When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE in the vnode lock's flags) until after they had determined if the vnode was a FIFO. This occurs after the vnode has been inserted into a VFS hash or some similar table, so it is possible for another thread to find this vnode via vget() on an i-node number and block on the vnode lock. If the lockmgr interlock (vnode interlock for vnode locks) is not held when clearing the LK_NOSHARE flag, then the lk_flags field can be clobbered. As a result the thread blocked on the vnode lock may never get woken up. Fix this by holding the vnode interlock while modifying the lock flags in this case. The softupdates code also toggles LK_NOSHARE in one function to close a race with snapshots. Fix this code to grab the interlock while fiddling with lk_flags. Author: jhb Date: Fri Aug 20 20:58:57 2010 New Revision: 211533 URL: http://svn.freebsd.org/changeset/base/211533 Log: Revert 210173 as it did not properly fix the bug. It assumed that the VI_LOCK() for a given vnode was used as the internal interlock for that vnode's v_lock lockmgr lock. This is not the case. Instead, add dedicated routines to toggle the LK_NOSHARE and LK_CANRECURSE flags. These routines lock the lockmgr lock's internal interlock to synchronize the updates to the flags member with other threads attempting to acquire the lock. The VN_LOCK_A*() macros now invoke these routines, and the softupdates code uses these routines to temporarly enable recursion on buffer locks. Reviewed by: kib -- John Baldwin _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"