Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself
On 13 Sep, Ian Dowse wrote: > For example, if you hold the reference count at 1 while calling the > cleanup function, it allows that function to safely add and drop > references, but if that cleanup function has a bug that drops one > too many references then you end up recursing instead of detecting > it as a negative reference count. I've found in some other code > that it works reasonably well to leave the reference count at zero, > but set a flag to stop further 1->0 transitions from retriggering > the cleanup. Obviously other approaches will work too. The cleanup function shouldn't be mucking with the reference count, which means that the present implementation of nfs_inactive() is broken, but I think there is already general agreement on that point. The only possible exception would be to increase the reference count to pass a reference to another thread, but that would be a silly thing for a cleanup function to do, since it would no longer be cleaning up. We could add a flag that would cause an immediate panic if the cleanup function fiddled with the reference count as an aid to tracking down broken code ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself
In message <[EMAIL PROTECTED]>, Terry Lambert writes: >Ian Dowse wrote: >> And I've just remembered a fifth :-) I think the old BSD code had >> both an `open' count and a reference count. The open count is a >> count of the real users of the vnode (it is what ufs_inactive really >> wants to compare against 0), and the reference count is just for >> places that you don't want the vnode to be recycled or destroyed. >> This was probably lost when the encumbered BSD sources were rewritten. > >No, this went away with the vnode locking changes; it was in the >4.4 code, for sure. I think references are the correct thing here, >and SunOS seems to agree, since that's how they implement, too. 8-). I seem to have mis-remembered the details anyway :-) It doesn't look as if there ever was ever the `open' count that I mentioned above. Maybe I was just thinking that it would be a good way to solve the issues of matching VOP_CLOSEs with VOP_OPENs, since there are many cases in the kernel that do not guarantee to do a VOP_CLOSE for each VOP_OPEN that was performed. Handling the dropping of a last reference is always tricky to get right when complex operations can be performed from the reference dropping function (especially where that includes adding and then removing references multiple times). It's even harder to do it in a way that continues to catch missing or extra references caused by bugs in the functions called when the reference count hits zero. For example, if you hold the reference count at 1 while calling the cleanup function, it allows that function to safely add and drop references, but if that cleanup function has a bug that drops one too many references then you end up recursing instead of detecting it as a negative reference count. I've found in some other code that it works reasonably well to leave the reference count at zero, but set a flag to stop further 1->0 transitions from retriggering the cleanup. Obviously other approaches will work too. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself
Ian Dowse wrote: > And I've just remembered a fifth :-) I think the old BSD code had > both an `open' count and a reference count. The open count is a > count of the real users of the vnode (it is what ufs_inactive really > wants to compare against 0), and the reference count is just for > places that you don't want the vnode to be recycled or destroyed. > This was probably lost when the encumbered BSD sources were rewritten. No, this went away with the vnode locking changes; it was in the 4.4 code, for sure. I think references are the correct thing here, and SunOS seems to agree, since that's how they implement, too. 8-). > At the time I was looking at it last, I remember thinking that the > open count would allow vrele/vput to keep the reference count at 1 > during the VOP_INACTIVE() call, which is what you were proposing. > It would also allow us to fix the problem of many places not matching > each VOP_OPEN() with a VOP_CLOSE(). I suspect we could clean up a > lot of related problems if the open count was brought back. Yes. It was murdered for good reason. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself
In message <[EMAIL PROTECTED]>, Don Lewis writes: >After looking at ufs_inactive(), I'd like to add a fourth proposal And I've just remembered a fifth :-) I think the old BSD code had both an `open' count and a reference count. The open count is a count of the real users of the vnode (it is what ufs_inactive really wants to compare against 0), and the reference count is just for places that you don't want the vnode to be recycled or destroyed. This was probably lost when the encumbered BSD sources were rewritten. At the time I was looking at it last, I remember thinking that the open count would allow vrele/vput to keep the reference count at 1 during the VOP_INACTIVE() call, which is what you were proposing. It would also allow us to fix the problem of many places not matching each VOP_OPEN() with a VOP_CLOSE(). I suspect we could clean up a lot of related problems if the open count was brought back. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself
On 11 Sep, Ian Dowse wrote: > In message <[EMAIL PROTECTED]>, Don Lewis writes: >> >>A potentially better solution just occurred to me. It looks like it >>would be better if vrele() waited to decrement v_usecount until *after* >>the call to VOP_INACTIVE() (and after the call to VI_LOCK()). If that >>were done, nfs_inactive() wouldn't need to muck with v_usecount at all. > > I've looked at this before; I think some filesystems (ufs anyway) > depend on v_usecount being 0 when VOP_INACTIVE() is called. Yup, though it would easy to tweak ufs_inactive() to expect the v_usecount to be one greater. A bigger problem is the call to vrecycle() in ufs_inactive(). > The > patch I have had lying around for quite a while is below. It adds > a vnode flag to avoid recursion into the last-reference handling > code in vrele/vput, which is the real problem. > > It also guarantees that a vnode will not be recycled during > VOP_INACTIVE(), so the nfs code no longer needs to hold an extra > reference in the first place. The flag manipulation code got a bit > messy after Jeff's vnode flag splitting work, so the patch could > probably be improved. If the need for nfs_inactive() to manipulate the reference count is eliminated, that should be sufficient to eliminate the vrele/vput recursion problem as well. > Whatever way this is done, we should try to avoid adding more hacks > to the nfs_inactive() code anyway. Tweaking nfs_inactive() has the advantage of being the least intrusive fix for this problem, but it isn't architecturally clean. I'd also argue that decrementing the vnode reference count to zero, but holding on to the reference for an extended period of time while doing I/O on the vnode is also a kludge. It also causes problems that require kludgey fixes, like the reference count manipulation in nfs_inactive(). For the most part the VI_INACTIVE flag is just another way of indicating that we are holding a reference to the vnode, so both the flag and the reference count need to be checked to see if the vnode is in use. The advantage of adding this flag versus just bumping the reference count is that the flag allows us to avoid the bugs in the current implementation while not requiring changes to code that expects the reference count to be zero in code called from VOP_INACTIVE(). Here are the three proposed fixes along with their advantages and disadvantages: Inline the v_usecount decrement code in nfs_inactive() + Least intrusive fix for the recursion bug - Adds a kludge to a kludge Call VOP_INACTIVE() before decrementing the reference count and tweak the foo_inactive() methods to expect the reference count to be one instead of zero + Eliminates the kludge from nfs_inactive() + We aren't lying about holding a reference, so hopefully less potential for bugs - It is not obvious to how to adapt ufs_inactive() Add VI_INACTIVE flag to prevent the vnode from being recycled + Eliminates the kludge from nfs_inactive() + Doesn't require changes to the other filesystems - Code complexity - Programmer needs to know when to look at VI_INACTIVE and when looking at just the reference count is ok After looking at ufs_inactive(), I'd like to add a fourth proposal that would involve splitting VOP_INACTIVE() into two separate methods. The first method would be called before the reference count is decremented and would do any I/O. The second method would be called after the reference count is decremented and could only be used for freeing resources, manipulating flags, putting it on the free list with vrecycle(), etc. + Eliminates the kludge from nfs_inactive() + We don't lie about holding a reference while doing I/O - More extensive code changes - Bikeshed arguments about what to call the two methods To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself
In message <[EMAIL PROTECTED]>, Don Lewis writes: > >A potentially better solution just occurred to me. It looks like it >would be better if vrele() waited to decrement v_usecount until *after* >the call to VOP_INACTIVE() (and after the call to VI_LOCK()). If that >were done, nfs_inactive() wouldn't need to muck with v_usecount at all. I've looked at this before; I think some filesystems (ufs anyway) depend on v_usecount being 0 when VOP_INACTIVE() is called. The patch I have had lying around for quite a while is below. It adds a vnode flag to avoid recursion into the last-reference handling code in vrele/vput, which is the real problem. It also guarantees that a vnode will not be recycled during VOP_INACTIVE(), so the nfs code no longer needs to hold an extra reference in the first place. The flag manipulation code got a bit messy after Jeff's vnode flag splitting work, so the patch could probably be improved. Whatever way this is done, we should try to avoid adding more hacks to the nfs_inactive() code anyway. Ian Index: sys/vnode.h === RCS file: /home/iedowse/CVS/src/sys/sys/vnode.h,v retrieving revision 1.206 diff -u -r1.206 vnode.h --- sys/vnode.h 1 Sep 2002 20:37:21 - 1.206 +++ sys/vnode.h 11 Sep 2002 11:06:46 - @@ -220,6 +220,7 @@ #defineVI_DOOMED 0x0080 /* This vnode is being recycled */ #defineVI_FREE 0x0100 /* This vnode is on the freelist */ #defineVI_OBJDIRTY 0x0400 /* object might be dirty */ +#defineVI_INACTIVE 0x0800 /* VOP_INACTIVE is in progress */ /* * XXX VI_ONWORKLST could be replaced with a check for NULL list elements * in v_synclist. @@ -377,14 +378,14 @@ /* Requires interlock */ #defineVSHOULDFREE(vp) \ - (!((vp)->v_iflag & (VI_FREE|VI_DOOMED)) && \ + (!((vp)->v_iflag & (VI_FREE|VI_DOOMED|VI_INACTIVE)) && \ !(vp)->v_holdcnt && !(vp)->v_usecount && \ (!(vp)->v_object || \ !((vp)->v_object->ref_count || (vp)->v_object->resident_page_count))) /* Requires interlock */ #define VMIGHTFREE(vp) \ - (!((vp)->v_iflag & (VI_FREE|VI_DOOMED|VI_XLOCK)) && \ + (!((vp)->v_iflag & (VI_FREE|VI_DOOMED|VI_XLOCK|VI_INACTIVE)) && \ LIST_EMPTY(&(vp)->v_cache_src) && !(vp)->v_usecount) /* Requires interlock */ Index: nfsclient/nfs_node.c === RCS file: /home/iedowse/CVS/src/sys/nfsclient/nfs_node.c,v retrieving revision 1.55 diff -u -r1.55 nfs_node.c --- nfsclient/nfs_node.c11 Jul 2002 17:54:58 - 1.55 +++ nfsclient/nfs_node.c11 Sep 2002 11:06:46 - @@ -289,21 +289,7 @@ } else sp = NULL; if (sp) { - /* -* We need a reference to keep the vnode from being -* recycled by getnewvnode while we do the I/O -* associated with discarding the buffers unless we -* are being forcibly unmounted in which case we already -* have our own reference. -*/ - if (ap->a_vp->v_usecount > 0) - (void) nfs_vinvalbuf(ap->a_vp, 0, sp->s_cred, td, 1); - else if (vget(ap->a_vp, 0, td)) - panic("nfs_inactive: lost vnode"); - else { - (void) nfs_vinvalbuf(ap->a_vp, 0, sp->s_cred, td, 1); - vrele(ap->a_vp); - } + (void)nfs_vinvalbuf(ap->a_vp, 0, sp->s_cred, td, 1); /* * Remove the silly file that was rename'd earlier */ Index: kern/vfs_subr.c === RCS file: /home/iedowse/CVS/src/sys/kern/vfs_subr.c,v retrieving revision 1.401 diff -u -r1.401 vfs_subr.c --- kern/vfs_subr.c 5 Sep 2002 20:46:19 - 1.401 +++ kern/vfs_subr.c 11 Sep 2002 11:06:46 - @@ -829,7 +829,8 @@ for (count = 0; count < freevnodes; count++) { vp = TAILQ_FIRST(&vnode_free_list); - KASSERT(vp->v_usecount == 0, + KASSERT(vp->v_usecount == 0 && + (vp->v_iflag & VI_INACTIVE) == 0, ("getnewvnode: free vnode isn't")); TAILQ_REMOVE(&vnode_free_list, vp, v_freelist); @@ -1980,8 +1981,8 @@ KASSERT(vp->v_writecount < vp->v_usecount || vp->v_usecount < 1, ("vrele: missed vn_close")); - if (vp->v_usecount > 1) { - + if (vp->v_usecount > 1 || + ((vp->v_iflag & VI_INACTIVE) && vp->v_usecount == 1)) { vp->v_usecount--; VI_UNLOCK(vp); @@ -1991,13 +1992,20 @@ if (vp->v_usecount == 1) { vp->v_usecount--; /* -* We must call VOP_INACTIVE with the node l
Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself
I remember seeing this panic many months back. I had fixed this in our Freebsd source base. Here is the scenario: vrele() { if (vp->v_usecount == 1) { vp->usecount--; if (VSHOULDFREE(vp)) vfree(vp); vn_lock(vp, LK_EXCLUSIVE | LK_INTERLOCK, p); (A) VOP_INACTIVE() { /* translates to nfs_inactive() */ sp = np->n_sillyrename; if (sp) { if (ap->a_vp->v_usecount > 0) { ... } else { vget(ap->a_vp, 0, p); nfs_vinvalbuf(ap->a_vp, 0, ...); vrele(ap->a_vp); (B) } ... } } } } vrele() calls nfs_inactive() since the vp->v_usecount == 0. Before calling nfs_inactive, it locks the vnode at (A) (since VOP_INACTIVE() expects the vnode to be locked and VOP_INACTIVE() releases the lock at the end). In nfs_inactive() a check is made for to see if the file was renamed/deleted while it was still in use. In this case it was, and so np->n_sillyrename is valid. To make sure that the vnode does not get reused while doing the I/O associated with discarding the buffers, the use_count is bumped up using vget(). Then nfs_vinvalbuf() is called to flush and invalidate all dirty buffers. After that vrele() is done on the vnode at (B). vrele() again comes to (A) and tries to lock the vnode, which was already locked before by the same process in the 1st pass. The kernel panics with "locking against myself". The vrele() that is done at (B) causes the cycle to be repeated. vrele() calls nfs_inactive() which calls vrele() and so forth. In nfs_inactive(), we know that the function calling it will free the vnode. So we need not call vfree(). We also know that we should not lock the vnode, again, and call VOP_INACTIVE(). So the fix is to inline the part of vrele() that just decrements the usecount. The following would be the fix. This is 4.3 source base. //depot/dev/freebsd-brick/src/sys/nfs/nfs_node.c#3 (text) 225c225,244 < vrele(ap->a_vp); --- > /* >* Inline part of vrele() since VOP_INACTIVE >* has already been called. >*/ > simple_lock(&ap->a_vp->v_interlock); > if (--ap->a_vp->v_usecount <= 0) { > #ifdef DIAGNOSTIC > if (ap->a_vp->v_usecount < 0 || > ap->a_vp->v_writecount != 0) { > vprint("nfs_inactive: bad ref count", > ap->a_vp); > panic("nfs_inactive: ref cnt"); > } > #endif > /* >* Since this is nfs_inactive(), vfree() >* would be called from the caller. >*/ > } > simple_unlock(&ap->a_vp->v_interlock); Deepankar Don Lewis wrote: > I've gotten a couple "lockmgr: locking against myself" panics today and > it seems to be somewhat reproduceable. I had the serial console hooked > up for the last one, so I was able to get a stack trace: > > > panic: lockmgr: locking against myself > Debugger("panic") > Stopped at Debugger+0x45: xchgl %ebx,in_Debugger.0 > db> tr > Debugger(c0449abc) at Debugger+0x45 > panic(c0447760,215,0,c6cd7de0,10002) at panic+0x7c > lockmgr(c6cd7ea8,10002,c6cd7de0,c6381f00,c6cd7de0) at lockmgr+0x412 > vop_sharedlock(e4b90b58,c047c440,c6cd7de0,1010002,c6381f00) at vop_sharedlock+0x > 6e > vn_lock(c6cd7de0,10002,c6381f00) at vn_lock+0xb4 > vrele(c6cd7de0,c6cd7de0,0,c681cb00,c6381f00,1) at vrele+0x8d > nfs_inactive(e4b90bdc,c047c3c0,c6cd7de0,c6381f00,c6381f00) at nfs_inactive+0xbd > VOP_INACTIVE(c6cd7de0,c6381f00) at VOP_INACTIVE+0xa0 > vrele(c6cd7de0,c6834474,c6834474,e4b90c38,c02eed4a) at vrele+0x9b > vn_close(c6cd7de0,2,c681cb00,c6381f00,e4b90c70) at vn_close+0x37 > vn_closefile(c6834474,c6381f00) at vn_closefile+0x1e > fdrop_locked(c6834474,c6381f00,c04fe8ac,0,c0446020) at fdrop_locked+0x102 > fdrop(c6834474,c6381f00,c028a1ec,c6381f00,c6377c8c) at fdrop+0x24 > closef(c6834474,c6381f00,c6834474,c6c0d3f8,c6381f00) at closef+0x77 > close(c6381f00,e4b90d14,1,b9,202) at close+0x107 > syscall(2f,2f,2f,82124cc,23) at syscall+0x243 > Xint0x80_syscall() at Xint0x80_syscall+0x1d > --- syscall (6, FreeBSD ELF32, close), eip = 0x4862ff3b, esp = 0xbfbff2dc, ebp = > 0xbfbff358 --- > > db> show lockedvnods > Locked vnodes > 0xc6cd7de0: type VREG, usecount 0, writecount 0, refcount 0,
Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself
On 10 Sep, Don Lewis wrote: > It looks like the call to vrele() from vn_close() is executing the > following code: > > if (vp->v_usecount == 1) { > vp->v_usecount--; > /* > * We must call VOP_INACTIVE with the node locked. > * If we are doing a vput, the node is already locked, > * but, in the case of vrele, we must explicitly lock > * the vnode before calling VOP_INACTIVE. > */ > if (vn_lock(vp, LK_EXCLUSIVE | LK_INTERLOCK, td) == 0) > VOP_INACTIVE(vp, td); > VI_LOCK(vp); > if (VSHOULDFREE(vp)) > vfree(vp); > else > vlruvp(vp); > VI_UNLOCK(vp); > > It has decremented v_usecount to 0, grabbed an exclusive lock on the > vnode, and has called VOP_INACTIVE(). > It looks like nfs_inactive() is executing the following code: > > if (sp) { > /* > * We need a reference to keep the vnode from being > * recycled by getnewvnode while we do the I/O > * associated with discarding the buffers unless we > * are being forcibly unmounted in which case we already > * have our own reference. > */ > if (ap->a_vp->v_usecount > 0) > (void) nfs_vinvalbuf(ap->a_vp, 0, sp->s_cred, td, 1); > else if (vget(ap->a_vp, 0, td)) > panic("nfs_inactive: lost vnode"); > else { > (void) nfs_vinvalbuf(ap->a_vp, 0, sp->s_cred, td, 1); > vrele(ap->a_vp); > } > A potentially better solution just occurred to me. It looks like it would be better if vrele() waited to decrement v_usecount until *after* the call to VOP_INACTIVE() (and after the call to VI_LOCK()). If that were done, nfs_inactive() wouldn't need to muck with v_usecount at all. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
nfs_inactive() bug? -> panic: lockmgr: locking against myself
I've gotten a couple "lockmgr: locking against myself" panics today and it seems to be somewhat reproduceable. I had the serial console hooked up for the last one, so I was able to get a stack trace: panic: lockmgr: locking against myself Debugger("panic") Stopped at Debugger+0x45: xchgl %ebx,in_Debugger.0 db> tr Debugger(c0449abc) at Debugger+0x45 panic(c0447760,215,0,c6cd7de0,10002) at panic+0x7c lockmgr(c6cd7ea8,10002,c6cd7de0,c6381f00,c6cd7de0) at lockmgr+0x412 vop_sharedlock(e4b90b58,c047c440,c6cd7de0,1010002,c6381f00) at vop_sharedlock+0x 6e vn_lock(c6cd7de0,10002,c6381f00) at vn_lock+0xb4 vrele(c6cd7de0,c6cd7de0,0,c681cb00,c6381f00,1) at vrele+0x8d nfs_inactive(e4b90bdc,c047c3c0,c6cd7de0,c6381f00,c6381f00) at nfs_inactive+0xbd VOP_INACTIVE(c6cd7de0,c6381f00) at VOP_INACTIVE+0xa0 vrele(c6cd7de0,c6834474,c6834474,e4b90c38,c02eed4a) at vrele+0x9b vn_close(c6cd7de0,2,c681cb00,c6381f00,e4b90c70) at vn_close+0x37 vn_closefile(c6834474,c6381f00) at vn_closefile+0x1e fdrop_locked(c6834474,c6381f00,c04fe8ac,0,c0446020) at fdrop_locked+0x102 fdrop(c6834474,c6381f00,c028a1ec,c6381f00,c6377c8c) at fdrop+0x24 closef(c6834474,c6381f00,c6834474,c6c0d3f8,c6381f00) at closef+0x77 close(c6381f00,e4b90d14,1,b9,202) at close+0x107 syscall(2f,2f,2f,82124cc,23) at syscall+0x243 Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (6, FreeBSD ELF32, close), eip = 0x4862ff3b, esp = 0xbfbff2dc, ebp = 0xbfbff358 --- db> show lockedvnods Locked vnodes 0xc6cd7de0: type VREG, usecount 0, writecount 0, refcount 0, flags (VV_OBJBUF) tag VT_NFS, fileid 112174 fsid 0x100ff02 It looks like the call to vrele() from vn_close() is executing the following code: if (vp->v_usecount == 1) { vp->v_usecount--; /* * We must call VOP_INACTIVE with the node locked. * If we are doing a vput, the node is already locked, * but, in the case of vrele, we must explicitly lock * the vnode before calling VOP_INACTIVE. */ if (vn_lock(vp, LK_EXCLUSIVE | LK_INTERLOCK, td) == 0) VOP_INACTIVE(vp, td); VI_LOCK(vp); if (VSHOULDFREE(vp)) vfree(vp); else vlruvp(vp); VI_UNLOCK(vp); It has decremented v_usecount to 0, grabbed an exclusive lock on the vnode, and has called VOP_INACTIVE(). It looks like nfs_inactive() is executing the following code: if (sp) { /* * We need a reference to keep the vnode from being * recycled by getnewvnode while we do the I/O * associated with discarding the buffers unless we * are being forcibly unmounted in which case we already * have our own reference. */ if (ap->a_vp->v_usecount > 0) (void) nfs_vinvalbuf(ap->a_vp, 0, sp->s_cred, td, 1); else if (vget(ap->a_vp, 0, td)) panic("nfs_inactive: lost vnode"); else { (void) nfs_vinvalbuf(ap->a_vp, 0, sp->s_cred, td, 1); vrele(ap->a_vp); } Because v_usecount was 0, nfs_inactive() calls vget(), which increments v_usecount to 1, has called nfs_vinvalbuf(), and finally calls vrele() again. Because v_usecount is 1, vrele() is re-executing the same code and blows up when it tries to grab the lock again. If that didn't cause a panic, VOP_INACTIVE() and nfs_inactive() would be called recursively, which would probably be a bad thing ... I suspect that nfs_inactive() needs to avoid calling vrele() here and should just decrement the v_usecount with the appropriate locking. Something like: VI_LOCK(ap->a_vp); ap->a_vp->v_usecount--; VI_UNLOCK(ap->a_vp); To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message