On 11/11/25 09:59, Mark Johnston wrote:
The branch main has been updated by markj:

URL: 
https://cgit.FreeBSD.org/src/commit/?id=99cb3dca4773fe4a16c500f9cb55fcd62cd8d7f3

commit 99cb3dca4773fe4a16c500f9cb55fcd62cd8d7f3
Author:     Mark Johnston <[email protected]>
AuthorDate: 2025-11-11 14:47:06 +0000
Commit:     Mark Johnston <[email protected]>
CommitDate: 2025-11-11 14:58:59 +0000

     vnode: Rework vput() to avoid holding the vnode lock after decrementing
It is not safe to modify the vnode structure after releasing one's
     reference.  Modify vput() to avoid this.  Use refcount_release_if_last()
     to opportunistically call vput_final() with the vnode lock held since we
     need the vnode lock in order to deactivate the vnode, and it's silly to
     drop the vnode lock and immediately reacquire it in this common case.
Note that vunref() has a similar flaw. D52628 aims to fix the problem
     more holistically, but this change fixes observable panics in the
     meantime.
Reported by: [email protected]
     Reported by:    [email protected]
     Reviewed by:    kib
     MFC after:      3 days
     Differential Revision:  https://reviews.freebsd.org/D52608
---
  sys/kern/vfs_subr.c | 7 ++++---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index 58975f7ac932..9cf983f6f89d 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -3713,11 +3713,12 @@ vput(struct vnode *vp)
ASSERT_VOP_LOCKED(vp, __func__);
        ASSERT_VI_UNLOCKED(vp, __func__);
-       if (!refcount_release(&vp->v_usecount)) {
-               VOP_UNLOCK(vp);
+       if (refcount_release_if_last(&vp->v_usecount)) {
+               vput_final(vp, VPUT);
                return;
        }
-       vput_final(vp, VPUT);
+       VOP_UNLOCK(vp);
+       vrele(vp);
  }

This commit results in reliable hangs for me when I do a 'make -j 16 
buildkernel'
in a VM where the source directory is mounted over NFS.  Even if there is 
nothing
to build so the make is just traversing all the directories.  The hung process 
is
stuck on a vnode lock.

Some of the stacks from procstat:

  PID    TID COMM                TDNAME              KSTACK
 1858 100343 make                -                   mi_switch+0x172 
sleepq_switch+0x109 sleeplk+0x110 lockmgr_slock_hard+0x3c3 VOP_LOCK1_APV+0x32 
nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 _vn_lock+0x53 
vget_finish+0xaf cache_lookup+0x2be nfs_lookup+0x4e1 vop_sigdefer+0x30 
VOP_LOOKUP_APV+0x57 vfs_lookup+0x5aa namei+0x35d vn_open_cred+0x592 
openatfp+0x2b7
root@head:~ # procstat -kk 1860
 1860 100360 make                -                   mi_switch+0x172 
sleepq_switch+0x109 sleeplk+0x110 lockmgr_slock_hard+0x3c3 VOP_LOCK1_APV+0x32 
nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 _vn_lock+0x53 
vget_finish+0xaf cache_lookup+0x2be nfs_lookup+0x4e1 vop_sigdefer+0x30 
VOP_LOOKUP_APV+0x57 vfs_lookup+0x5aa namei+0x35d vn_open_cred+0x592 
openatfp+0x2b7
root@head:~ # procstat -kk 1855
 1855 100314 make                -                   mi_switch+0x172 
sleepq_switch+0x109 sleeplk+0x110 lockmgr_xlock_hard+0x3d0 VOP_LOCK1_APV+0x32 
nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 _vn_lock+0x53 
vput_final+0x269 vput+0xab vfs_lookup+0xa7a namei+0x35d vn_open_cred+0x592 
openatfp+0x2b7 sys_open+0x28 amd64_syscall+0x169 fast_syscall_common+0xf8

This last one is kind of interesting as it calls vput_final?

(kgdb) p vp
$2 = (struct vnode *) 0xfffff8016bb6c898
(kgdb) p vp->v_lock
$3 = {lock_object = {lo_name = 0xffffffff81467118 <nfs_vnode_tag> "nfs",
    lo_flags = 117112832, lo_data = 0, lo_witness = 0xfffff8043fd83900},
  lk_lock = 39, lk_exslpfail = 0, lk_pri = 44, lk_timo = 6}
(kgdb) p/x vp->v_lock->lk_lock
$4 = 0x27

So there is one shared lock, and both shared and exclusive waiters.  Seems like
a vnode lock has been leaked?  Or rather, the vput_final() in this case is
called from vrele(), so the vput() for 1855 has hit the new path.  I wonder if
when it dropped its lock via VOP_UNLOCK() it is no longer eligible to re-lock
it as it needs some other sharer to also unlock before it can re-acquire the
lock in vput_final?  LK_UPGRADE attempts don't block if there are shared waiters
but only one 1 shared lock held, but if you do LK_UNLOCK and then LK_XLOCK that
will indeed honor the other waiters and not acquire the lock, so a vput that
held the last shared lock would have "worked" to ugprade before even with
other waiters, but now it can drop the lock and then get stuck.  I guess the
question though in this case is that some other thread must have snuck in
and acquired a shared lock between the VOP_UNLOCK and the vrele() and is
blocked while holding it.

Aha, so most of the other makes are stuck in VOP_LOOKUP on this same vnode
(which I believe is "/usr/src/sys/dev" as in 1855 the vput() is inside of
namei() and this is the state of ndp:

(kgdb) p *ndp
$9 = {
  ni_dirp = 0x481afdaa2d7d <error: Cannot access memory at address 
0x481afdaa2d7d>, ni_segflg = UIO_USERSPACE, ni_rightsneeded = 0xfffffe00d83d2d90,
  ni_startdir = 0x0, ni_rootdir = 0xfffff80005e461b8, ni_topdir = 0x0,
  ni_dirfd = -100, ni_lcf = 0, ni_filecaps = {fc_rights = {cr_rights = {0,
        0}}, fc_ioctls = 0x0, fc_nioctls = -1, fc_fcntls = 0},
  ni_vp = 0xfffff801bc007000, ni_dvp = 0xfffff8016bb6c898, ni_resflags = 1,
  ni_debugflags = 3, ni_loopcnt = 0, ni_pathlen = 1,
  ni_next = 0xfffff8000a0b0817 "", ni_cnd = {cn_flags = 8936259908,
    cn_cred = 0xfffff80005e53c00, cn_nameiop = LOOKUP, cn_lkflags = 524288,
    cn_pnbuf = 0xfffff8000a0b0800 "/usr/src/sys/dev/acpica",
    cn_nameptr = 0xfffff8000a0b0811 "acpica", cn_namelen = 6},
  ni_cap_tracker = {tqh_first = 0x0, tqh_last = 0xfffffe00d83d2d48},
  ni_rbeneath_dpp = 0x0, ni_nctrack_mnt = 0x0, ni_dvp_seqc = 0, ni_vp_seqc = 0}

But pid 1844 is a make blocked on another vnode:

(kgdb) p vp
$24 = (struct vnode *) 0xfffff801bc007000

And that one appears to be the "acpica" child directory:

(kgdb) p *vp->v_cache_dst.tqh_first
$27 = {nc_src = {le_next = 0xfffff801bc01b9b8, le_prev = 0xfffff801bc01b898},
  nc_dst = {tqe_next = 0x0, tqe_prev = 0xfffff801bc007058}, nc_hash = {
    csle_next = 0x0}, nc_dvp = 0xfffff8016bb6c898, n_un = {
    nu_vp = 0xfffff801bc007000, nu_neg = {neg_flag = 0 '\000',
      neg_hit = 112 'p'}}, nc_flag = 12 '\f', nc_nlen = 6 '\006',
  nc_name = 0xfffff801bc01b962 "acpica"}

And note it's nc_dvp is the vnode 1855 is trying to vput and all the other
makes are trying to VOP_LOCK in nfs_lookup().

So my guess is that 1844 was able to VOP_LOCK() "/usr/src/sys/dev" in the
window between VOP_UNLOCK() and the vput_final() invoked from vrele().

Humm, except that the vnode in question shouldn't be in vput_final at all??

Back to 1855 (proc doing vput_final):

(kgdb)
#12 0xffffffff80c9bc89 in vput_final (vp=0xfffff8016bb6c898,
    func=func@entry=VRELE) at /usr/src/sys/kern/vfs_subr.c:3628
3628                            error = vn_lock(vp, LK_EXCLUSIVE | 
LK_INTERLOCK);
(kgdb) p vp
$6 = (struct vnode *) 0xfffff8016bb6c898
(kgdb) p vp->v_usecount
$7 = 14
(kgdb) p vp->v_holdcnt
$8 = 7

Or did the other make's manage to acquire new use counts while 1855 was blocked
on the lock in vput_final?

Anyway, it seems 14 make's are blocked on this vnode (/usr/src/sys/dev).  1844
is blocked on the "acpica" vnode, and another make process (1856) is blocked on
"acpica", but via getdirentries() instead of namei():

#4  0xffffffff80b62ec0 in sleeplk (lk=lk@entry=0xfffff801bc007070,
    flags=flags@entry=2098176, ilk=ilk@entry=0xfffff801bc007098,
    wmesg=wmesg@entry=0xffffffff81467118 <nfs_vnode_tag> "nfs",
    pri=<optimized out>, pri@entry=44, timo=timo@entry=6, queue=1)
    at /usr/src/sys/kern/kern_lock.c:302
#5  0xffffffff80b60fc3 in lockmgr_slock_hard (lk=0xfffff801bc007070,
    flags=2098176, ilk=0xfffff801bc007098, file=<optimized out>, line=4333,
    lwa=0x0) at /usr/src/sys/kern/kern_lock.c:693
#6  0xffffffff811b97a2 in VOP_LOCK1_APV (
    vop=0xffffffff81aeeb38 <default_vnodeops>, a=a@entry=0xfffffe00d8445c88)
    at vnode_if.c:2103
#7  0xffffffff80a6050c in nfs_lock (ap=ap@entry=0xfffffe00d8445c88)
    at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:344
#8  0xffffffff80c83d90 in vop_sigdefer (vop=<optimized out>,
    a=0xfffffe00d8445c88) at /usr/src/sys/kern/vfs_default.c:1502
#9  0xffffffff811b97a2 in VOP_LOCK1_APV (
    vop=0xffffffff81aada28 <newnfs_vnodeops>, a=a@entry=0xfffffe00d8445c88)
    at vnode_if.c:2103
#10 0xffffffff80cb6b43 in VOP_LOCK1 (vp=0xfffff801bc007000, flags=2098176,
    file=0xffffffff81359deb "/usr/src/sys/kern/vfs_syscalls.c", line=4333)
    at ./vnode_if.h:1316
#11 _vn_lock (vp=vp@entry=0xfffff801bc007000, flags=flags@entry=2098176,
    file=<unavailable>, line=<unavailable>, line@entry=4333)
    at /usr/src/sys/kern/vfs_vnops.c:1970
#12 0xffffffff80cb2d94 in kern_getdirentries (td=0xfffff80006cb5000,
    fd=<optimized out>,
    buf=0x2088dba27000 <error: Cannot access memory at address 0x2088dba27000>, 
count=4096, basep=basep@entry=0xfffffe00d8445df0, residp=residp@entry=0x0,
    bufseg=UIO_USERSPACE) at /usr/src/sys/kern/vfs_syscalls.c:4333
#13 0xffffffff80cb32a9 in sys_getdirentries (td=<unavailable>,
    uap=0xfffff80006cb5428) at /usr/src/sys/kern/vfs_syscalls.c:4287

The "acpica" lock is held by a writer at least:

(kgdb) p vp
$9 = (struct vnode *) 0xfffff801bc007000
(kgdb) p vp->v_lock
$10 = {lock_object = {lo_name = 0xffffffff81467118 <nfs_vnode_tag> "nfs",
    lo_flags = 117112832, lo_data = 0, lo_witness = 0xfffff8043fd83900},
  lk_lock = 18446735277917198210, lk_exslpfail = 0, lk_pri = 44, lk_timo = 6}
(kgdb) p/x vp->v_lock.lk_lock
$14 = 0xfffff80011ebd782
(kgdb) set $td = (struct thread *)0xfffff80011ebd780
(kgdb) p $td->td_tid
$15 = 100314
(kgdb) p $td->td_proc->p_comm
$16 = "make", '\000' <repeats 15 times>
(kgdb) p $td->td_proc->p_pid
$17 = 1855

Oh dear, so that's the deadlock.

1855 needs to get an exclusive lock on "/usr/src/sys/dev/" to finish its vput()
_while_ it holds an exclusive lock on "/usr/src/sys/dev/acpica".  And the other
shared lock on "/usr/src/sys/dev" is held by 1844 who is blocked trying to share
lock "/usr/src/sys/dev/acpica" during lookup().

So by doing the VOP_UNLOCK() and then reacquiring the lock via vrele(), you've
introduced a vnode LOR as the vput() of the parent directory can now try to
re-lock the vnode of the parent directory while we hold a vnode on the
child vnode.

--
John Baldwin


Reply via email to