Re: panic: assertwaitok: non-zero mutex count: 2

Vitaliy Makkoveev Tue, 01 Jul 2025 15:27:56 -0700

On Tue, Jul 01, 2025 at 11:45:23PM +0200, Alexander Bluhm wrote:
> On Tue, Jul 01, 2025 at 09:08:48PM +0200, Mark Kettenis wrote:
> > > Date: Tue, 1 Jul 2025 20:41:47 +0200
> > > From: Alexander Bluhm <[email protected]>
> > > 
> > > Hi
> > > 
> > > I see this crash on a vmd guest while running regress/sys/kern/sosplice.
> > > Note that it is a single CPU GENERIC kernel.  sysctl kern.splassert=2
> > > 
> > > panic: assertwaitok: non-zero mutex count: 2
> > > Stopped at      db_enter+0x14:  popq    %rbp
> > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > > *519542  91140      0         0x1          0    0  perl
> > > db_enter() at db_enter+0x14
> > > panic(ffffffff82595a39) at panic+0xc9
> > > assertwaitok() at assertwaitok+0x9e
> > > mi_switch() at mi_switch+0x19c
> > > pool_get(ffffffff82a28d28,1) at pool_get+0xe7
> > > uvm_mapent_alloc(ffffffff82b0eb60,8) at uvm_mapent_alloc+0x2b2
> > > uvm_map_mkentry(ffffffff82b0eb60,fffffd8006e6cbd0,fffffd8006e6cbd0,ffff80002a32
> > > 0000,1000,8,79bcd127adccfb5a,7) at uvm_map_mkentry+0x63
> > > uvm_mapent_clone(ffffffff82b0eb60,ffff80002a320000,1000,0,1,7,a33acdf397a7ed83,
> > > fffffd806c1f89e8,fffffd806e3beb40,c) at uvm_mapent_clone+0x92
> > > uvm_map_extract(fffffd806e3beb40,83d6d1f7000,1000,ffff80002a39f048,8) at 
> > > uvm_ma
> > > p_extract+0x309
> > > sys_kbind(ffff80002a294020,ffff80002a39f160,ffff80002a39f0d0) at 
> > > sys_kbind+0x3a
> > > 1
> > > syscall(ffff80002a39f160) at syscall+0x444
> > > Xsyscall() at Xsyscall+0x128
> > > end of kernel
> > > end trace frame: 0x783818799758, count: 3
> > > https://www.openbsd.org/ddb.html describes the minimum info required in 
> > > bug
> > > reports.  Insufficient info makes it difficult to find and fix bugs.
> > 
> > I don't see anything in that codepath to would end up there with a
> > mutex held.  So my guess is you somehow returned to userland with a
> > mutex held because of a missing mtx_leave() call in an error path.  Or
> > maybe an interrupt handler that forgot to unlock a mutex?
> 
> That makes sense.  I also get the same panic with the same test
> but different stacktrace.
> 
> panic: assertwaitok: non-zero mutex count: 2
> Stopped at      db_enter+0x14:  popq    %rbp
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> *184775  13589      0         0x1          0    0  perl
> db_enter() at db_enter+0x14
> panic(ffffffff82595a39) at panic+0xc9
> assertwaitok() at assertwaitok+0x9e
> mi_switch() at mi_switch+0x19c
> pool_get(ffffffff82b1fb10,1) at pool_get+0xe7
> m_split(fffffd806964f900,9,1) at m_split+0xa9
> somove(ffff800000b0d6f8,1) at somove+0xb2a
> sosplice(ffff800000b0d6f8,1,3d,fffffd8006e71430) at sosplice+0x513
> sys_setsockopt(ffff80002a2a1498,ffff80002a3995e0,ffff80002a399550) at 
> sys_setso
> ckopt+0x169
> syscall(ffff80002a3995e0) at syscall+0x444
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x76a44e747000, count: 4
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> 
> Stangely it only happens with GENERIC kernel, but not with WITNESS.
> 
> bluhm
>


I think these panics are different.

The "while (((rcvstate & SS_RCVATMARK).." loop of somove() also has two
m_get(wait, ...) calls, which must be moved outside of mutex(9) section
too. The loop operates with local data so it is possible.

Index: sys/kern/uipc_socket.c
===================================================================
RCS file: /cvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.378
diff -u -p -r1.378 uipc_socket.c
--- sys/kern/uipc_socket.c      23 May 2025 23:41:46 -0000      1.378
+++ sys/kern/uipc_socket.c      1 Jul 2025 22:21:36 -0000
@@ -1763,21 +1763,22 @@ somove(struct socket *so, int wait)
            (so->so_options & SO_OOBINLINE)) {
                struct mbuf *o = NULL;
 
+               mtx_leave(&sosp->so_snd.sb_mtx);
+               mtx_leave(&so->so_rcv.sb_mtx);
+
                if (rcvstate & SS_RCVATMARK) {
                        o = m_get(wait, MT_DATA);
                        rcvstate &= ~SS_RCVATMARK;
                } else if (oobmark) {
                        o = m_split(m, oobmark, wait);
                        if (o) {
-                               mtx_leave(&sosp->so_snd.sb_mtx);
-                               mtx_leave(&so->so_rcv.sb_mtx);
                                solock_shared(sosp);
                                error = pru_send(sosp, m, NULL, NULL);
                                sounlock_shared(sosp);
-                               mtx_enter(&so->so_rcv.sb_mtx);
-                               mtx_enter(&sosp->so_snd.sb_mtx);
 
                                if (error) {
+                                       mtx_enter(&so->so_rcv.sb_mtx);
+                                       mtx_enter(&sosp->so_snd.sb_mtx);
                                        if (sosp->so_snd.sb_state &
                                            SS_CANTSENDMORE)
                                                error = EPIPE;
@@ -1795,15 +1796,13 @@ somove(struct socket *so, int wait)
                        o->m_len = 1;
                        *mtod(o, caddr_t) = *mtod(m, caddr_t);
 
-                       mtx_leave(&sosp->so_snd.sb_mtx);
-                       mtx_leave(&so->so_rcv.sb_mtx);
                        solock_shared(sosp);
                        error = pru_sendoob(sosp, o, NULL, NULL);
                        sounlock_shared(sosp);
-                       mtx_enter(&so->so_rcv.sb_mtx);
-                       mtx_enter(&sosp->so_snd.sb_mtx);
 
                        if (error) {
+                               mtx_enter(&so->so_rcv.sb_mtx);
+                               mtx_enter(&sosp->so_snd.sb_mtx);
                                if (sosp->so_snd.sb_state & SS_CANTSENDMORE)
                                        error = EPIPE;
                                m_freem(m);
@@ -1818,6 +1817,9 @@ somove(struct socket *so, int wait)
                        }
                        m_adj(m, 1);
                }
+
+               mtx_enter(&so->so_rcv.sb_mtx);
+               mtx_enter(&sosp->so_snd.sb_mtx);
        }
 
        /* Append all remaining data to drain socket. */

Re: panic: assertwaitok: non-zero mutex count: 2

Reply via email to