Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed

Martin Pieuchot Wed, 28 Feb 2024 05:46:08 -0800

On 28/02/24(Wed) 16:39, Vitaliy Makkoveev wrote:
> On Wed, Feb 28, 2024 at 02:22:31PM +0100, Mark Kettenis wrote:
> > > Date: Wed, 28 Feb 2024 16:16:09 +0300
> > > From: Vitaliy Makkoveev <m...@openbsd.org>
> > > 
> > > On Wed, Feb 28, 2024 at 12:36:26PM +0100, Claudio Jeker wrote:
> > > > On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote:
> > > > > Hi,
> > > > > 
> > > > > thank you for looking into it, and for the advice.
> > > > > 
> > > > > On Wed, 28 Feb 2024 10:13:06 +0000
> > > > > Stuart Henderson <s...@spacehopper.org> wrote:
> > > > > 
> > > > > > Please try to re-type at least the most important bits from a
> > > > > > screenshot so readers can quickly see which subsystems are involved.
> > > > > 
> > > > > Below is manual transcript of whole screenshot, hopefully no typos.
> > > > > 
> > > > > If you have any advice on what should I do if it happens again in 
> > > > > order
> > > > > to get as much info for debuggers as possible, please let me know.
> > > > > 
> > > > > splassert: assertwaitok: want 0 have 4
> > > > > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file 
> > > > > "/usr/src/sys/kern/kern_sched.c", line 373
> > > > > Stopped at db_enter+0x14: popq %rbp
> > > > >    TID    PID  UID   PRFLAGS  PFLAGS  CPU  COMMAND
> > > > > 199248  36172  577      0x10       0    1  openvpn
> > > > > 490874  47446    0   0x14000   0x200    2  wg_handshake
> > > > >  71544   9311    0   0x14000   0x200    3  softnet0
> > > > > db_enter() at db_enter+0x14
> > > > > panic(ffffffff820a4b9f) at panic+0xc3
> > > > > __assert(ffffffff82121fcb,ffffffff8209ae5f,175,ffffffff82092fbf) at 
> > > > > assert+0x29
> > > > > sched_chooseproc() at sched_chooseproc+0x26d
> > > > > mi_switch() at mi_switch+0x17f
> > > > > sleep_finish(0,1) at sleep_finish+0x107
> > > > > rw_enter(ffff800008003cf0,2) at rw_enter+0x1ad
> > > > > noise_remote_ready(ffff800008003bf0) at noise_remote_ready+0x33
> > > > > wg_qstart(fff800000a622a8) at wg_qstart+0x18c
> > > > > ifq_serialize(ffff800000a622a8,ffff800000a62390) at ifq_serialize+0xfd
> > > > > hfsc_deferred(ffff800000a62000) at hfsc_deferred+0x68
> > > > > softclock_process_tick_timeout(ffff80000115e248,1) at 
> > > > > softclock_process_tick_timeout+0xfb
> > > > > softclock(0) at softclock+0xb8
> > > > > softintr_dispatch(0) at softintr_dispatch+0xeb
> > > > > end trace frame: 0xffff800020dbc730, count:0
> > > > > 
> > > > 
> > > > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work
> > > > when called from inside a timeout aka softclock? This is interrupt 
> > > > context
> > > > code is not allowed to sleep there.
> > > > 
> > > 
> > > Not only wg(4). Depends on interface queue usage, ifq_start() schedules
> > > (*if_qstart)() or calls it, so all the interfaces with use rwlock(9) in
> > > (*if_qstart)() handler are in risk.
> > > 
> > > What about to always schedule (*if_qstart)()?
> > 
> > Why would you want to introduce additional latence?
> > 
> 
> I suppose it the less evil than strictly deny rwlocks in (*if_qstart)().
> Anyway it will be scheduled unless `seq_len' exceeds the watermark.


Please no.  This is not going to happen.  wg(4) has to be fixed.  Let's
not change the design of the kernel every time a bug is found.

Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed

Reply via email to