Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
Claudio Jeker wrote: > Thanks for the details. I suggest to commit this so we can get some tests > quickly and hopefully it helps to fix some of the wg(4) crashes people > hit. According to your other mail there are some additional races that > require fixing. quickly means now, and we have about 2 weeks to pull it out (it has a .h file diff in it, but userland does not really look at it)
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
On Tue, Mar 05, 2024 at 12:16:37PM +0300, Vitaliy Makkoveev wrote: > On Mon, Mar 04, 2024 at 01:00:32PM +0100, Claudio Jeker wrote: > > On Wed, Feb 28, 2024 at 06:03:14PM +0300, Vitaliy Makkoveev wrote: > > > On Wed, Feb 28, 2024 at 02:45:44PM +0100, Martin Pieuchot wrote: > > > > > > > > > > > > > > Not only wg(4). Depends on interface queue usage, ifq_start() > > > > > > > schedules > > > > > > > (*if_qstart)() or calls it, so all the interfaces with use > > > > > > > rwlock(9) in > > > > > > > (*if_qstart)() handler are in risk. > > > > > > > > > > > > > > What about to always schedule (*if_qstart)()? > > > > > > > > > > > > Why would you want to introduce additional latence? > > > > > > > > > > > > > > > > I suppose it the less evil than strictly deny rwlocks in > > > > > (*if_qstart)(). > > > > > Anyway it will be scheduled unless `seq_len' exceeds the watermark. > > > > > > > > Please no. This is not going to happen. wg(4) has to be fixed. Let's > > > > not change the design of the kernel every time a bug is found. > > > > > > > > > > I'm not the fan of ifq_start() behaviour. > > > > > > wg(4) needs to convert `t_lock', `r_keypair_lock' and `c_lock' rwlocks > > > to mutexes. I used mtx_init_flags() to keep existing names. > > > > You do not mention if you verified that any of those codepath may sleep or > > not. > > I use wg(4), so, obviously, I verified this. > > wg_qstart() calls the following routines: noise_remote_ready(), > wg_tag_get(), wg_timers_event_want_initiation() and wg_queue_out(). > > noise_remote_ready() has the sleep point produced by `r_keypair_lock' > rwlock. We also acquire `c_lock' rwlock with `r_keypair_lock' rwlock > held. I changed them both to `r_keypair_mtx' and `c_mtx' mutexes. > > `c_lock' serialized paths noise_counter_send() and noise_counter_recv() > have no sleep points. > > `r_keypair_lock' serialized paths noise_remote_begin_session(), > noise_remote_clear(), noise_remote_expire_current(), > noise_remote_ready(), noise_remote_encrypt() and noise_remote_decrypt() > have no sleep points. The suspicious noise_remote_keypair_allocate() > only performs operations around list. > > wg_tag_get() has no sleep points. > > wg_timers_event_want_initiation() has the sleep point produced by > `t_lock' rwlock. We also acquire `t_handshake_mtx' mutex with > `t_lock' held. I changed `t_lock' to `t_mtx' mutex. > > wg_queue_out() has no sleep points. > > `c_lock' protected paths has no sleep points. > > The `t_mtx' is the simplest for review. It protects `t_disabled' and > `t_need_another_keepalive' and the serialized paths are timeout or task > scheduling. > > > ...I don't have the time to go over all those codepaths and check. > > > > I thought the reported is interesting to fix this and provides some > feedback. Thanks for the details. I suggest to commit this so we can get some tests quickly and hopefully it helps to fix some of the wg(4) crashes people hit. According to your other mail there are some additional races that require fixing. Diff looks OK to me -- but I don't use wg(4). > > > Index: sys/net/if_wg.c > > > === > > > RCS file: /cvs/src/sys/net/if_wg.c,v > > > retrieving revision 1.36 > > > diff -u -p -r1.36 if_wg.c > > > --- sys/net/if_wg.c 18 Jan 2024 08:46:41 - 1.36 > > > +++ sys/net/if_wg.c 28 Feb 2024 14:49:16 - > > > @@ -150,8 +150,8 @@ struct wg_index { > > > }; > > > > > > struct wg_timers { > > > - /* t_lock is for blocking wg_timers_event_* when setting t_disabled. */ > > > - struct rwlockt_lock; > > > + /* t_mtx is for blocking wg_timers_event_* when setting t_disabled. */ > > > + struct mutex t_mtx; > > > > > > int t_disabled; > > > int t_need_another_keepalive; > > > @@ -930,7 +930,7 @@ void > > > wg_timers_init(struct wg_timers *t) > > > { > > > bzero(t, sizeof(*t)); > > > - rw_init(>t_lock, "wg_timers"); > > > + mtx_init_flags(>t_mtx, IPL_NET, "wg_timers", 0); > > > mtx_init(>t_handshake_mtx, IPL_NET); > > > > > > timeout_set(>t_new_handshake, wg_timers_run_new_handshake, t); > > > @@ -945,19 +945,19 @@ wg_timers_init(struct wg_timers *t) > > > void > > > wg_timers_enable(struct wg_timers *t) > > > { > > > - rw_enter_write(>t_lock); > > > + mtx_enter(>t_mtx); > > > t->t_disabled = 0; > > > - rw_exit_write(>t_lock); > > > + mtx_leave(>t_mtx); > > > wg_timers_run_persistent_keepalive(t); > > > } > > > > > > void > > > wg_timers_disable(struct wg_timers *t) > > > { > > > - rw_enter_write(>t_lock); > > > + mtx_enter(>t_mtx); > > > t->t_disabled = 1; > > > t->t_need_another_keepalive = 0; > > > - rw_exit_write(>t_lock); > > > + mtx_leave(>t_mtx); > > > > > > timeout_del_barrier(>t_new_handshake); > > > timeout_del_barrier(>t_send_keepalive); > > > @@ -969,12 +969,12 @@ wg_timers_disable(struct wg_timers *t) > > > void > > >
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
On Mon, Mar 04, 2024 at 01:00:32PM +0100, Claudio Jeker wrote: > On Wed, Feb 28, 2024 at 06:03:14PM +0300, Vitaliy Makkoveev wrote: > > On Wed, Feb 28, 2024 at 02:45:44PM +0100, Martin Pieuchot wrote: > > > > > > > > > > > > Not only wg(4). Depends on interface queue usage, ifq_start() > > > > > > schedules > > > > > > (*if_qstart)() or calls it, so all the interfaces with use > > > > > > rwlock(9) in > > > > > > (*if_qstart)() handler are in risk. > > > > > > > > > > > > What about to always schedule (*if_qstart)()? > > > > > > > > > > Why would you want to introduce additional latence? > > > > > > > > > > > > > I suppose it the less evil than strictly deny rwlocks in (*if_qstart)(). > > > > Anyway it will be scheduled unless `seq_len' exceeds the watermark. > > > > > > Please no. This is not going to happen. wg(4) has to be fixed. Let's > > > not change the design of the kernel every time a bug is found. > > > > > > > I'm not the fan of ifq_start() behaviour. > > > > wg(4) needs to convert `t_lock', `r_keypair_lock' and `c_lock' rwlocks > > to mutexes. I used mtx_init_flags() to keep existing names. > > You do not mention if you verified that any of those codepath may sleep or > not. I use wg(4), so, obviously, I verified this. wg_qstart() calls the following routines: noise_remote_ready(), wg_tag_get(), wg_timers_event_want_initiation() and wg_queue_out(). noise_remote_ready() has the sleep point produced by `r_keypair_lock' rwlock. We also acquire `c_lock' rwlock with `r_keypair_lock' rwlock held. I changed them both to `r_keypair_mtx' and `c_mtx' mutexes. `c_lock' serialized paths noise_counter_send() and noise_counter_recv() have no sleep points. `r_keypair_lock' serialized paths noise_remote_begin_session(), noise_remote_clear(), noise_remote_expire_current(), noise_remote_ready(), noise_remote_encrypt() and noise_remote_decrypt() have no sleep points. The suspicious noise_remote_keypair_allocate() only performs operations around list. wg_tag_get() has no sleep points. wg_timers_event_want_initiation() has the sleep point produced by `t_lock' rwlock. We also acquire `t_handshake_mtx' mutex with `t_lock' held. I changed `t_lock' to `t_mtx' mutex. wg_queue_out() has no sleep points. `c_lock' protected paths has no sleep points. The `t_mtx' is the simplest for review. It protects `t_disabled' and `t_need_another_keepalive' and the serialized paths are timeout or task scheduling. > ...I don't have the time to go over all those codepaths and check. > I thought the reported is interesting to fix this and provides some feedback. > > Index: sys/net/if_wg.c > > === > > RCS file: /cvs/src/sys/net/if_wg.c,v > > retrieving revision 1.36 > > diff -u -p -r1.36 if_wg.c > > --- sys/net/if_wg.c 18 Jan 2024 08:46:41 - 1.36 > > +++ sys/net/if_wg.c 28 Feb 2024 14:49:16 - > > @@ -150,8 +150,8 @@ struct wg_index { > > }; > > > > struct wg_timers { > > - /* t_lock is for blocking wg_timers_event_* when setting t_disabled. */ > > - struct rwlockt_lock; > > + /* t_mtx is for blocking wg_timers_event_* when setting t_disabled. */ > > + struct mutex t_mtx; > > > > int t_disabled; > > int t_need_another_keepalive; > > @@ -930,7 +930,7 @@ void > > wg_timers_init(struct wg_timers *t) > > { > > bzero(t, sizeof(*t)); > > - rw_init(>t_lock, "wg_timers"); > > + mtx_init_flags(>t_mtx, IPL_NET, "wg_timers", 0); > > mtx_init(>t_handshake_mtx, IPL_NET); > > > > timeout_set(>t_new_handshake, wg_timers_run_new_handshake, t); > > @@ -945,19 +945,19 @@ wg_timers_init(struct wg_timers *t) > > void > > wg_timers_enable(struct wg_timers *t) > > { > > - rw_enter_write(>t_lock); > > + mtx_enter(>t_mtx); > > t->t_disabled = 0; > > - rw_exit_write(>t_lock); > > + mtx_leave(>t_mtx); > > wg_timers_run_persistent_keepalive(t); > > } > > > > void > > wg_timers_disable(struct wg_timers *t) > > { > > - rw_enter_write(>t_lock); > > + mtx_enter(>t_mtx); > > t->t_disabled = 1; > > t->t_need_another_keepalive = 0; > > - rw_exit_write(>t_lock); > > + mtx_leave(>t_mtx); > > > > timeout_del_barrier(>t_new_handshake); > > timeout_del_barrier(>t_send_keepalive); > > @@ -969,12 +969,12 @@ wg_timers_disable(struct wg_timers *t) > > void > > wg_timers_set_persistent_keepalive(struct wg_timers *t, uint16_t interval) > > { > > - rw_enter_read(>t_lock); > > + mtx_enter(>t_mtx); > > if (!t->t_disabled) { > > t->t_persistent_keepalive_interval = interval; > > wg_timers_run_persistent_keepalive(t); > > } > > - rw_exit_read(>t_lock); > > + mtx_leave(>t_mtx); > > } > > > > int > > @@ -1020,16 +1020,16 @@ wg_timers_event_data_sent(struct wg_time > > int msecs = NEW_HANDSHAKE_TIMEOUT * 1000; > > msecs += arc4random_uniform(REKEY_TIMEOUT_JITTER); > >
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
On Wed, Feb 28, 2024 at 06:03:14PM +0300, Vitaliy Makkoveev wrote: > On Wed, Feb 28, 2024 at 02:45:44PM +0100, Martin Pieuchot wrote: > > > > > > > > > > Not only wg(4). Depends on interface queue usage, ifq_start() > > > > > schedules > > > > > (*if_qstart)() or calls it, so all the interfaces with use rwlock(9) > > > > > in > > > > > (*if_qstart)() handler are in risk. > > > > > > > > > > What about to always schedule (*if_qstart)()? > > > > > > > > Why would you want to introduce additional latence? > > > > > > > > > > I suppose it the less evil than strictly deny rwlocks in (*if_qstart)(). > > > Anyway it will be scheduled unless `seq_len' exceeds the watermark. > > > > Please no. This is not going to happen. wg(4) has to be fixed. Let's > > not change the design of the kernel every time a bug is found. > > > > I'm not the fan of ifq_start() behaviour. > > wg(4) needs to convert `t_lock', `r_keypair_lock' and `c_lock' rwlocks > to mutexes. I used mtx_init_flags() to keep existing names. You do not mention if you verified that any of those codepath may sleep or not. I don't have the time to go over all those codepaths and check. > Index: sys/net/if_wg.c > === > RCS file: /cvs/src/sys/net/if_wg.c,v > retrieving revision 1.36 > diff -u -p -r1.36 if_wg.c > --- sys/net/if_wg.c 18 Jan 2024 08:46:41 - 1.36 > +++ sys/net/if_wg.c 28 Feb 2024 14:49:16 - > @@ -150,8 +150,8 @@ struct wg_index { > }; > > struct wg_timers { > - /* t_lock is for blocking wg_timers_event_* when setting t_disabled. */ > - struct rwlockt_lock; > + /* t_mtx is for blocking wg_timers_event_* when setting t_disabled. */ > + struct mutex t_mtx; > > int t_disabled; > int t_need_another_keepalive; > @@ -930,7 +930,7 @@ void > wg_timers_init(struct wg_timers *t) > { > bzero(t, sizeof(*t)); > - rw_init(>t_lock, "wg_timers"); > + mtx_init_flags(>t_mtx, IPL_NET, "wg_timers", 0); > mtx_init(>t_handshake_mtx, IPL_NET); > > timeout_set(>t_new_handshake, wg_timers_run_new_handshake, t); > @@ -945,19 +945,19 @@ wg_timers_init(struct wg_timers *t) > void > wg_timers_enable(struct wg_timers *t) > { > - rw_enter_write(>t_lock); > + mtx_enter(>t_mtx); > t->t_disabled = 0; > - rw_exit_write(>t_lock); > + mtx_leave(>t_mtx); > wg_timers_run_persistent_keepalive(t); > } > > void > wg_timers_disable(struct wg_timers *t) > { > - rw_enter_write(>t_lock); > + mtx_enter(>t_mtx); > t->t_disabled = 1; > t->t_need_another_keepalive = 0; > - rw_exit_write(>t_lock); > + mtx_leave(>t_mtx); > > timeout_del_barrier(>t_new_handshake); > timeout_del_barrier(>t_send_keepalive); > @@ -969,12 +969,12 @@ wg_timers_disable(struct wg_timers *t) > void > wg_timers_set_persistent_keepalive(struct wg_timers *t, uint16_t interval) > { > - rw_enter_read(>t_lock); > + mtx_enter(>t_mtx); > if (!t->t_disabled) { > t->t_persistent_keepalive_interval = interval; > wg_timers_run_persistent_keepalive(t); > } > - rw_exit_read(>t_lock); > + mtx_leave(>t_mtx); > } > > int > @@ -1020,16 +1020,16 @@ wg_timers_event_data_sent(struct wg_time > int msecs = NEW_HANDSHAKE_TIMEOUT * 1000; > msecs += arc4random_uniform(REKEY_TIMEOUT_JITTER); > > - rw_enter_read(>t_lock); > + mtx_enter(>t_mtx); > if (!t->t_disabled && !timeout_pending(>t_new_handshake)) > timeout_add_msec(>t_new_handshake, msecs); > - rw_exit_read(>t_lock); > + mtx_leave(>t_mtx); > } > > void > wg_timers_event_data_received(struct wg_timers *t) > { > - rw_enter_read(>t_lock); > + mtx_enter(>t_mtx); > if (!t->t_disabled) { > if (!timeout_pending(>t_send_keepalive)) > timeout_add_sec(>t_send_keepalive, > @@ -1037,7 +1037,7 @@ wg_timers_event_data_received(struct wg_ > else > t->t_need_another_keepalive = 1; > } > - rw_exit_read(>t_lock); > + mtx_leave(>t_mtx); > } > > void > @@ -1055,11 +1055,11 @@ wg_timers_event_any_authenticated_packet > void > wg_timers_event_any_authenticated_packet_traversal(struct wg_timers *t) > { > - rw_enter_read(>t_lock); > + mtx_enter(>t_mtx); > if (!t->t_disabled && t->t_persistent_keepalive_interval > 0) > timeout_add_sec(>t_persistent_keepalive, > t->t_persistent_keepalive_interval); > - rw_exit_read(>t_lock); > + mtx_leave(>t_mtx); > } > > void > @@ -1068,10 +1068,10 @@ wg_timers_event_handshake_initiated(stru > int msecs = REKEY_TIMEOUT * 1000; > msecs += arc4random_uniform(REKEY_TIMEOUT_JITTER); > > - rw_enter_read(>t_lock); > + mtx_enter(>t_mtx); > if (!t->t_disabled) >
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
On Wed, Feb 28, 2024 at 02:45:44PM +0100, Martin Pieuchot wrote: > > > > > > > > Not only wg(4). Depends on interface queue usage, ifq_start() schedules > > > > (*if_qstart)() or calls it, so all the interfaces with use rwlock(9) in > > > > (*if_qstart)() handler are in risk. > > > > > > > > What about to always schedule (*if_qstart)()? > > > > > > Why would you want to introduce additional latence? > > > > > > > I suppose it the less evil than strictly deny rwlocks in (*if_qstart)(). > > Anyway it will be scheduled unless `seq_len' exceeds the watermark. > > Please no. This is not going to happen. wg(4) has to be fixed. Let's > not change the design of the kernel every time a bug is found. > I'm not the fan of ifq_start() behaviour. wg(4) needs to convert `t_lock', `r_keypair_lock' and `c_lock' rwlocks to mutexes. I used mtx_init_flags() to keep existing names. Index: sys/net/if_wg.c === RCS file: /cvs/src/sys/net/if_wg.c,v retrieving revision 1.36 diff -u -p -r1.36 if_wg.c --- sys/net/if_wg.c 18 Jan 2024 08:46:41 - 1.36 +++ sys/net/if_wg.c 28 Feb 2024 14:49:16 - @@ -150,8 +150,8 @@ struct wg_index { }; struct wg_timers { - /* t_lock is for blocking wg_timers_event_* when setting t_disabled. */ - struct rwlockt_lock; + /* t_mtx is for blocking wg_timers_event_* when setting t_disabled. */ + struct mutex t_mtx; int t_disabled; int t_need_another_keepalive; @@ -930,7 +930,7 @@ void wg_timers_init(struct wg_timers *t) { bzero(t, sizeof(*t)); - rw_init(>t_lock, "wg_timers"); + mtx_init_flags(>t_mtx, IPL_NET, "wg_timers", 0); mtx_init(>t_handshake_mtx, IPL_NET); timeout_set(>t_new_handshake, wg_timers_run_new_handshake, t); @@ -945,19 +945,19 @@ wg_timers_init(struct wg_timers *t) void wg_timers_enable(struct wg_timers *t) { - rw_enter_write(>t_lock); + mtx_enter(>t_mtx); t->t_disabled = 0; - rw_exit_write(>t_lock); + mtx_leave(>t_mtx); wg_timers_run_persistent_keepalive(t); } void wg_timers_disable(struct wg_timers *t) { - rw_enter_write(>t_lock); + mtx_enter(>t_mtx); t->t_disabled = 1; t->t_need_another_keepalive = 0; - rw_exit_write(>t_lock); + mtx_leave(>t_mtx); timeout_del_barrier(>t_new_handshake); timeout_del_barrier(>t_send_keepalive); @@ -969,12 +969,12 @@ wg_timers_disable(struct wg_timers *t) void wg_timers_set_persistent_keepalive(struct wg_timers *t, uint16_t interval) { - rw_enter_read(>t_lock); + mtx_enter(>t_mtx); if (!t->t_disabled) { t->t_persistent_keepalive_interval = interval; wg_timers_run_persistent_keepalive(t); } - rw_exit_read(>t_lock); + mtx_leave(>t_mtx); } int @@ -1020,16 +1020,16 @@ wg_timers_event_data_sent(struct wg_time int msecs = NEW_HANDSHAKE_TIMEOUT * 1000; msecs += arc4random_uniform(REKEY_TIMEOUT_JITTER); - rw_enter_read(>t_lock); + mtx_enter(>t_mtx); if (!t->t_disabled && !timeout_pending(>t_new_handshake)) timeout_add_msec(>t_new_handshake, msecs); - rw_exit_read(>t_lock); + mtx_leave(>t_mtx); } void wg_timers_event_data_received(struct wg_timers *t) { - rw_enter_read(>t_lock); + mtx_enter(>t_mtx); if (!t->t_disabled) { if (!timeout_pending(>t_send_keepalive)) timeout_add_sec(>t_send_keepalive, @@ -1037,7 +1037,7 @@ wg_timers_event_data_received(struct wg_ else t->t_need_another_keepalive = 1; } - rw_exit_read(>t_lock); + mtx_leave(>t_mtx); } void @@ -1055,11 +1055,11 @@ wg_timers_event_any_authenticated_packet void wg_timers_event_any_authenticated_packet_traversal(struct wg_timers *t) { - rw_enter_read(>t_lock); + mtx_enter(>t_mtx); if (!t->t_disabled && t->t_persistent_keepalive_interval > 0) timeout_add_sec(>t_persistent_keepalive, t->t_persistent_keepalive_interval); - rw_exit_read(>t_lock); + mtx_leave(>t_mtx); } void @@ -1068,10 +1068,10 @@ wg_timers_event_handshake_initiated(stru int msecs = REKEY_TIMEOUT * 1000; msecs += arc4random_uniform(REKEY_TIMEOUT_JITTER); - rw_enter_read(>t_lock); + mtx_enter(>t_mtx); if (!t->t_disabled) timeout_add_msec(>t_retry_handshake, msecs); - rw_exit_read(>t_lock); + mtx_leave(>t_mtx); } void @@ -1085,7 +1085,7 @@ wg_timers_event_handshake_responded(stru void wg_timers_event_handshake_complete(struct wg_timers *t) { - rw_enter_read(>t_lock); + mtx_enter(>t_mtx); if (!t->t_disabled) { mtx_enter(>t_handshake_mtx);
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
On 28/02/24(Wed) 16:39, Vitaliy Makkoveev wrote: > On Wed, Feb 28, 2024 at 02:22:31PM +0100, Mark Kettenis wrote: > > > Date: Wed, 28 Feb 2024 16:16:09 +0300 > > > From: Vitaliy Makkoveev > > > > > > On Wed, Feb 28, 2024 at 12:36:26PM +0100, Claudio Jeker wrote: > > > > On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote: > > > > > Hi, > > > > > > > > > > thank you for looking into it, and for the advice. > > > > > > > > > > On Wed, 28 Feb 2024 10:13:06 + > > > > > Stuart Henderson wrote: > > > > > > > > > > > Please try to re-type at least the most important bits from a > > > > > > screenshot so readers can quickly see which subsystems are involved. > > > > > > > > > > Below is manual transcript of whole screenshot, hopefully no typos. > > > > > > > > > > If you have any advice on what should I do if it happens again in > > > > > order > > > > > to get as much info for debuggers as possible, please let me know. > > > > > > > > > > splassert: assertwaitok: want 0 have 4 > > > > > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file > > > > > "/usr/src/sys/kern/kern_sched.c", line 373 > > > > > Stopped at db_enter+0x14: popq %rbp > > > > >TIDPID UID PRFLAGS PFLAGS CPU COMMAND > > > > > 199248 36172 577 0x10 01 openvpn > > > > > 490874 474460 0x14000 0x2002 wg_handshake > > > > > 71544 93110 0x14000 0x2003 softnet0 > > > > > db_enter() at db_enter+0x14 > > > > > panic(820a4b9f) at panic+0xc3 > > > > > __assert(82121fcb,8209ae5f,175,82092fbf) at > > > > > assert+0x29 > > > > > sched_chooseproc() at sched_chooseproc+0x26d > > > > > mi_switch() at mi_switch+0x17f > > > > > sleep_finish(0,1) at sleep_finish+0x107 > > > > > rw_enter(88003cf0,2) at rw_enter+0x1ad > > > > > noise_remote_ready(88003bf0) at noise_remote_ready+0x33 > > > > > wg_qstart(fff80a622a8) at wg_qstart+0x18c > > > > > ifq_serialize(80a622a8,80a62390) at ifq_serialize+0xfd > > > > > hfsc_deferred(80a62000) at hfsc_deferred+0x68 > > > > > softclock_process_tick_timeout(8115e248,1) at > > > > > softclock_process_tick_timeout+0xfb > > > > > softclock(0) at softclock+0xb8 > > > > > softintr_dispatch(0) at softintr_dispatch+0xeb > > > > > end trace frame: 0x800020dbc730, count:0 > > > > > > > > > > > > > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work > > > > when called from inside a timeout aka softclock? This is interrupt > > > > context > > > > code is not allowed to sleep there. > > > > > > > > > > Not only wg(4). Depends on interface queue usage, ifq_start() schedules > > > (*if_qstart)() or calls it, so all the interfaces with use rwlock(9) in > > > (*if_qstart)() handler are in risk. > > > > > > What about to always schedule (*if_qstart)()? > > > > Why would you want to introduce additional latence? > > > > I suppose it the less evil than strictly deny rwlocks in (*if_qstart)(). > Anyway it will be scheduled unless `seq_len' exceeds the watermark. Please no. This is not going to happen. wg(4) has to be fixed. Let's not change the design of the kernel every time a bug is found.
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
On Wed, Feb 28, 2024 at 02:37:31PM +0100, Martin Pieuchot wrote: > On 28/02/24(Wed) 12:36, Claudio Jeker wrote: > > On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote: > > > Hi, > > > > > > thank you for looking into it, and for the advice. > > > > > > On Wed, 28 Feb 2024 10:13:06 + > > > Stuart Henderson wrote: > > > > > > > Please try to re-type at least the most important bits from a > > > > screenshot so readers can quickly see which subsystems are involved. > > > > > > Below is manual transcript of whole screenshot, hopefully no typos. > > > > > > If you have any advice on what should I do if it happens again in order > > > to get as much info for debuggers as possible, please let me know. > > > > > > splassert: assertwaitok: want 0 have 4 > > > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file > > > "/usr/src/sys/kern/kern_sched.c", line 373 > > > Stopped at db_enter+0x14: popq %rbp > > >TIDPID UID PRFLAGS PFLAGS CPU COMMAND > > > 199248 36172 577 0x10 01 openvpn > > > 490874 474460 0x14000 0x2002 wg_handshake > > > 71544 93110 0x14000 0x2003 softnet0 > > > db_enter() at db_enter+0x14 > > > panic(820a4b9f) at panic+0xc3 > > > __assert(82121fcb,8209ae5f,175,82092fbf) at > > > assert+0x29 > > > sched_chooseproc() at sched_chooseproc+0x26d > > > mi_switch() at mi_switch+0x17f > > > sleep_finish(0,1) at sleep_finish+0x107 > > > rw_enter(88003cf0,2) at rw_enter+0x1ad > > > noise_remote_ready(88003bf0) at noise_remote_ready+0x33 > > > wg_qstart(fff80a622a8) at wg_qstart+0x18c > > > ifq_serialize(80a622a8,80a62390) at ifq_serialize+0xfd > > > hfsc_deferred(80a62000) at hfsc_deferred+0x68 > > > softclock_process_tick_timeout(8115e248,1) at > > > softclock_process_tick_timeout+0xfb > > > softclock(0) at softclock+0xb8 > > > softintr_dispatch(0) at softintr_dispatch+0xeb > > > end trace frame: 0x800020dbc730, count:0 > > > > > > > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work > > when called from inside a timeout aka softclock? This is interrupt context > > code is not allowed to sleep there. > > It is indeed. It isn't clear to me why `r_keypair_lock' is a rwlock and > not a mutex. Does anything sleep in this code? Why did they pick > sleeping locks in the first place? Maybe because multiple readers are possible with an rwlock. -- :wq Claudio
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
On Wed, Feb 28, 2024 at 02:22:31PM +0100, Mark Kettenis wrote: > > Date: Wed, 28 Feb 2024 16:16:09 +0300 > > From: Vitaliy Makkoveev > > > > On Wed, Feb 28, 2024 at 12:36:26PM +0100, Claudio Jeker wrote: > > > On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote: > > > > Hi, > > > > > > > > thank you for looking into it, and for the advice. > > > > > > > > On Wed, 28 Feb 2024 10:13:06 + > > > > Stuart Henderson wrote: > > > > > > > > > Please try to re-type at least the most important bits from a > > > > > screenshot so readers can quickly see which subsystems are involved. > > > > > > > > Below is manual transcript of whole screenshot, hopefully no typos. > > > > > > > > If you have any advice on what should I do if it happens again in order > > > > to get as much info for debuggers as possible, please let me know. > > > > > > > > splassert: assertwaitok: want 0 have 4 > > > > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file > > > > "/usr/src/sys/kern/kern_sched.c", line 373 > > > > Stopped at db_enter+0x14: popq %rbp > > > >TIDPID UID PRFLAGS PFLAGS CPU COMMAND > > > > 199248 36172 577 0x10 01 openvpn > > > > 490874 474460 0x14000 0x2002 wg_handshake > > > > 71544 93110 0x14000 0x2003 softnet0 > > > > db_enter() at db_enter+0x14 > > > > panic(820a4b9f) at panic+0xc3 > > > > __assert(82121fcb,8209ae5f,175,82092fbf) at > > > > assert+0x29 > > > > sched_chooseproc() at sched_chooseproc+0x26d > > > > mi_switch() at mi_switch+0x17f > > > > sleep_finish(0,1) at sleep_finish+0x107 > > > > rw_enter(88003cf0,2) at rw_enter+0x1ad > > > > noise_remote_ready(88003bf0) at noise_remote_ready+0x33 > > > > wg_qstart(fff80a622a8) at wg_qstart+0x18c > > > > ifq_serialize(80a622a8,80a62390) at ifq_serialize+0xfd > > > > hfsc_deferred(80a62000) at hfsc_deferred+0x68 > > > > softclock_process_tick_timeout(8115e248,1) at > > > > softclock_process_tick_timeout+0xfb > > > > softclock(0) at softclock+0xb8 > > > > softintr_dispatch(0) at softintr_dispatch+0xeb > > > > end trace frame: 0x800020dbc730, count:0 > > > > > > > > > > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work > > > when called from inside a timeout aka softclock? This is interrupt context > > > code is not allowed to sleep there. > > > > > > > Not only wg(4). Depends on interface queue usage, ifq_start() schedules > > (*if_qstart)() or calls it, so all the interfaces with use rwlock(9) in > > (*if_qstart)() handler are in risk. > > > > What about to always schedule (*if_qstart)()? > > Why would you want to introduce additional latence? > I suppose it the less evil than strictly deny rwlocks in (*if_qstart)(). Anyway it will be scheduled unless `seq_len' exceeds the watermark.
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
On 28/02/24(Wed) 12:36, Claudio Jeker wrote: > On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote: > > Hi, > > > > thank you for looking into it, and for the advice. > > > > On Wed, 28 Feb 2024 10:13:06 + > > Stuart Henderson wrote: > > > > > Please try to re-type at least the most important bits from a > > > screenshot so readers can quickly see which subsystems are involved. > > > > Below is manual transcript of whole screenshot, hopefully no typos. > > > > If you have any advice on what should I do if it happens again in order > > to get as much info for debuggers as possible, please let me know. > > > > splassert: assertwaitok: want 0 have 4 > > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file > > "/usr/src/sys/kern/kern_sched.c", line 373 > > Stopped at db_enter+0x14: popq %rbp > >TIDPID UID PRFLAGS PFLAGS CPU COMMAND > > 199248 36172 577 0x10 01 openvpn > > 490874 474460 0x14000 0x2002 wg_handshake > > 71544 93110 0x14000 0x2003 softnet0 > > db_enter() at db_enter+0x14 > > panic(820a4b9f) at panic+0xc3 > > __assert(82121fcb,8209ae5f,175,82092fbf) at > > assert+0x29 > > sched_chooseproc() at sched_chooseproc+0x26d > > mi_switch() at mi_switch+0x17f > > sleep_finish(0,1) at sleep_finish+0x107 > > rw_enter(88003cf0,2) at rw_enter+0x1ad > > noise_remote_ready(88003bf0) at noise_remote_ready+0x33 > > wg_qstart(fff80a622a8) at wg_qstart+0x18c > > ifq_serialize(80a622a8,80a62390) at ifq_serialize+0xfd > > hfsc_deferred(80a62000) at hfsc_deferred+0x68 > > softclock_process_tick_timeout(8115e248,1) at > > softclock_process_tick_timeout+0xfb > > softclock(0) at softclock+0xb8 > > softintr_dispatch(0) at softintr_dispatch+0xeb > > end trace frame: 0x800020dbc730, count:0 > > > > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work > when called from inside a timeout aka softclock? This is interrupt context > code is not allowed to sleep there. It is indeed. It isn't clear to me why `r_keypair_lock' is a rwlock and not a mutex. Does anything sleep in this code? Why did they pick sleeping locks in the first place?
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
> Date: Wed, 28 Feb 2024 16:16:09 +0300 > From: Vitaliy Makkoveev > > On Wed, Feb 28, 2024 at 12:36:26PM +0100, Claudio Jeker wrote: > > On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote: > > > Hi, > > > > > > thank you for looking into it, and for the advice. > > > > > > On Wed, 28 Feb 2024 10:13:06 + > > > Stuart Henderson wrote: > > > > > > > Please try to re-type at least the most important bits from a > > > > screenshot so readers can quickly see which subsystems are involved. > > > > > > Below is manual transcript of whole screenshot, hopefully no typos. > > > > > > If you have any advice on what should I do if it happens again in order > > > to get as much info for debuggers as possible, please let me know. > > > > > > splassert: assertwaitok: want 0 have 4 > > > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file > > > "/usr/src/sys/kern/kern_sched.c", line 373 > > > Stopped at db_enter+0x14: popq %rbp > > >TIDPID UID PRFLAGS PFLAGS CPU COMMAND > > > 199248 36172 577 0x10 01 openvpn > > > 490874 474460 0x14000 0x2002 wg_handshake > > > 71544 93110 0x14000 0x2003 softnet0 > > > db_enter() at db_enter+0x14 > > > panic(820a4b9f) at panic+0xc3 > > > __assert(82121fcb,8209ae5f,175,82092fbf) at > > > assert+0x29 > > > sched_chooseproc() at sched_chooseproc+0x26d > > > mi_switch() at mi_switch+0x17f > > > sleep_finish(0,1) at sleep_finish+0x107 > > > rw_enter(88003cf0,2) at rw_enter+0x1ad > > > noise_remote_ready(88003bf0) at noise_remote_ready+0x33 > > > wg_qstart(fff80a622a8) at wg_qstart+0x18c > > > ifq_serialize(80a622a8,80a62390) at ifq_serialize+0xfd > > > hfsc_deferred(80a62000) at hfsc_deferred+0x68 > > > softclock_process_tick_timeout(8115e248,1) at > > > softclock_process_tick_timeout+0xfb > > > softclock(0) at softclock+0xb8 > > > softintr_dispatch(0) at softintr_dispatch+0xeb > > > end trace frame: 0x800020dbc730, count:0 > > > > > > > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work > > when called from inside a timeout aka softclock? This is interrupt context > > code is not allowed to sleep there. > > > > Not only wg(4). Depends on interface queue usage, ifq_start() schedules > (*if_qstart)() or calls it, so all the interfaces with use rwlock(9) in > (*if_qstart)() handler are in risk. > > What about to always schedule (*if_qstart)()? Why would you want to introduce additional latence? > Index: sys/net/hfsc.c > === > RCS file: /cvs/src/sys/net/hfsc.c,v > retrieving revision 1.49 > diff -u -p -r1.49 hfsc.c > --- sys/net/hfsc.c11 Apr 2023 00:45:09 - 1.49 > +++ sys/net/hfsc.c28 Feb 2024 13:15:22 - > @@ -953,8 +953,7 @@ hfsc_deferred(void *arg) > if (!HFSC_ENABLED(ifq)) > return; > > - if (!ifq_empty(ifq)) > - ifq_start(ifq); > + ifq_start_deferred(ifq); > > hif = ifq_q_enter(>if_snd, ifq_hfsc_ops); > if (hif == NULL) > Index: sys/net/ifq.c > === > RCS file: /cvs/src/sys/net/ifq.c,v > retrieving revision 1.53 > diff -u -p -r1.53 ifq.c > --- sys/net/ifq.c 10 Nov 2023 15:51:24 - 1.53 > +++ sys/net/ifq.c 28 Feb 2024 13:15:22 - > @@ -133,6 +133,12 @@ ifq_start(struct ifqueue *ifq) > } else > task_add(ifq->ifq_softnet, >ifq_bundle); > } > +void > +ifq_start_deferred(struct ifqueue *ifq) > +{ > + if (ifq_len(ifq)) > + task_add(ifq->ifq_softnet, >ifq_bundle); > +} > > void > ifq_start_task(void *p) > Index: sys/net/ifq.h > === > RCS file: /cvs/src/sys/net/ifq.h,v > retrieving revision 1.41 > diff -u -p -r1.41 ifq.h > --- sys/net/ifq.h 10 Nov 2023 15:51:24 - 1.41 > +++ sys/net/ifq.h 28 Feb 2024 13:15:22 - > @@ -430,6 +430,7 @@ void ifq_destroy(struct ifqueue *); > void ifq_add_data(struct ifqueue *, struct if_data *); > int ifq_enqueue(struct ifqueue *, struct mbuf *); > void ifq_start(struct ifqueue *); > +void ifq_start_deferred(struct ifqueue *); > struct mbuf *ifq_deq_begin(struct ifqueue *); > void ifq_deq_commit(struct ifqueue *, struct mbuf *); > void ifq_deq_rollback(struct ifqueue *, struct mbuf *); > >
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
On Wed, Feb 28, 2024 at 12:36:26PM +0100, Claudio Jeker wrote: > On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote: > > Hi, > > > > thank you for looking into it, and for the advice. > > > > On Wed, 28 Feb 2024 10:13:06 + > > Stuart Henderson wrote: > > > > > Please try to re-type at least the most important bits from a > > > screenshot so readers can quickly see which subsystems are involved. > > > > Below is manual transcript of whole screenshot, hopefully no typos. > > > > If you have any advice on what should I do if it happens again in order > > to get as much info for debuggers as possible, please let me know. > > > > splassert: assertwaitok: want 0 have 4 > > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file > > "/usr/src/sys/kern/kern_sched.c", line 373 > > Stopped at db_enter+0x14: popq %rbp > >TIDPID UID PRFLAGS PFLAGS CPU COMMAND > > 199248 36172 577 0x10 01 openvpn > > 490874 474460 0x14000 0x2002 wg_handshake > > 71544 93110 0x14000 0x2003 softnet0 > > db_enter() at db_enter+0x14 > > panic(820a4b9f) at panic+0xc3 > > __assert(82121fcb,8209ae5f,175,82092fbf) at > > assert+0x29 > > sched_chooseproc() at sched_chooseproc+0x26d > > mi_switch() at mi_switch+0x17f > > sleep_finish(0,1) at sleep_finish+0x107 > > rw_enter(88003cf0,2) at rw_enter+0x1ad > > noise_remote_ready(88003bf0) at noise_remote_ready+0x33 > > wg_qstart(fff80a622a8) at wg_qstart+0x18c > > ifq_serialize(80a622a8,80a62390) at ifq_serialize+0xfd > > hfsc_deferred(80a62000) at hfsc_deferred+0x68 > > softclock_process_tick_timeout(8115e248,1) at > > softclock_process_tick_timeout+0xfb > > softclock(0) at softclock+0xb8 > > softintr_dispatch(0) at softintr_dispatch+0xeb > > end trace frame: 0x800020dbc730, count:0 > > > > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work > when called from inside a timeout aka softclock? This is interrupt context > code is not allowed to sleep there. > Not only wg(4). Depends on interface queue usage, ifq_start() schedules (*if_qstart)() or calls it, so all the interfaces with use rwlock(9) in (*if_qstart)() handler are in risk. What about to always schedule (*if_qstart)()? Index: sys/net/hfsc.c === RCS file: /cvs/src/sys/net/hfsc.c,v retrieving revision 1.49 diff -u -p -r1.49 hfsc.c --- sys/net/hfsc.c 11 Apr 2023 00:45:09 - 1.49 +++ sys/net/hfsc.c 28 Feb 2024 13:15:22 - @@ -953,8 +953,7 @@ hfsc_deferred(void *arg) if (!HFSC_ENABLED(ifq)) return; - if (!ifq_empty(ifq)) - ifq_start(ifq); + ifq_start_deferred(ifq); hif = ifq_q_enter(>if_snd, ifq_hfsc_ops); if (hif == NULL) Index: sys/net/ifq.c === RCS file: /cvs/src/sys/net/ifq.c,v retrieving revision 1.53 diff -u -p -r1.53 ifq.c --- sys/net/ifq.c 10 Nov 2023 15:51:24 - 1.53 +++ sys/net/ifq.c 28 Feb 2024 13:15:22 - @@ -133,6 +133,12 @@ ifq_start(struct ifqueue *ifq) } else task_add(ifq->ifq_softnet, >ifq_bundle); } +void +ifq_start_deferred(struct ifqueue *ifq) +{ + if (ifq_len(ifq)) + task_add(ifq->ifq_softnet, >ifq_bundle); +} void ifq_start_task(void *p) Index: sys/net/ifq.h === RCS file: /cvs/src/sys/net/ifq.h,v retrieving revision 1.41 diff -u -p -r1.41 ifq.h --- sys/net/ifq.h 10 Nov 2023 15:51:24 - 1.41 +++ sys/net/ifq.h 28 Feb 2024 13:15:22 - @@ -430,6 +430,7 @@ void ifq_destroy(struct ifqueue *); voidifq_add_data(struct ifqueue *, struct if_data *); int ifq_enqueue(struct ifqueue *, struct mbuf *); voidifq_start(struct ifqueue *); +voidifq_start_deferred(struct ifqueue *); struct mbuf*ifq_deq_begin(struct ifqueue *); voidifq_deq_commit(struct ifqueue *, struct mbuf *); voidifq_deq_rollback(struct ifqueue *, struct mbuf *);
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote: > Hi, > > thank you for looking into it, and for the advice. > > On Wed, 28 Feb 2024 10:13:06 + > Stuart Henderson wrote: > > > Please try to re-type at least the most important bits from a > > screenshot so readers can quickly see which subsystems are involved. > > Below is manual transcript of whole screenshot, hopefully no typos. > > If you have any advice on what should I do if it happens again in order > to get as much info for debuggers as possible, please let me know. > > splassert: assertwaitok: want 0 have 4 > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file > "/usr/src/sys/kern/kern_sched.c", line 373 > Stopped at db_enter+0x14: popq %rbp >TIDPID UID PRFLAGS PFLAGS CPU COMMAND > 199248 36172 577 0x10 01 openvpn > 490874 474460 0x14000 0x2002 wg_handshake > 71544 93110 0x14000 0x2003 softnet0 > db_enter() at db_enter+0x14 > panic(820a4b9f) at panic+0xc3 > __assert(82121fcb,8209ae5f,175,82092fbf) at > assert+0x29 > sched_chooseproc() at sched_chooseproc+0x26d > mi_switch() at mi_switch+0x17f > sleep_finish(0,1) at sleep_finish+0x107 > rw_enter(88003cf0,2) at rw_enter+0x1ad > noise_remote_ready(88003bf0) at noise_remote_ready+0x33 > wg_qstart(fff80a622a8) at wg_qstart+0x18c > ifq_serialize(80a622a8,80a62390) at ifq_serialize+0xfd > hfsc_deferred(80a62000) at hfsc_deferred+0x68 > softclock_process_tick_timeout(8115e248,1) at > softclock_process_tick_timeout+0xfb > softclock(0) at softclock+0xb8 > softintr_dispatch(0) at softintr_dispatch+0xeb > end trace frame: 0x800020dbc730, count:0 > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work when called from inside a timeout aka softclock? This is interrupt context code is not allowed to sleep there. -- :wq Claudio
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
Hi, thank you for looking into it, and for the advice. On Wed, 28 Feb 2024 10:13:06 + Stuart Henderson wrote: > Please try to re-type at least the most important bits from a > screenshot so readers can quickly see which subsystems are involved. Below is manual transcript of whole screenshot, hopefully no typos. If you have any advice on what should I do if it happens again in order to get as much info for debuggers as possible, please let me know. splassert: assertwaitok: want 0 have 4 panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file "/usr/src/sys/kern/kern_sched.c", line 373 Stopped at db_enter+0x14: popq %rbp TIDPID UID PRFLAGS PFLAGS CPU COMMAND 199248 36172 577 0x10 01 openvpn 490874 474460 0x14000 0x2002 wg_handshake 71544 93110 0x14000 0x2003 softnet0 db_enter() at db_enter+0x14 panic(820a4b9f) at panic+0xc3 __assert(82121fcb,8209ae5f,175,82092fbf) at assert+0x29 sched_chooseproc() at sched_chooseproc+0x26d mi_switch() at mi_switch+0x17f sleep_finish(0,1) at sleep_finish+0x107 rw_enter(88003cf0,2) at rw_enter+0x1ad noise_remote_ready(88003bf0) at noise_remote_ready+0x33 wg_qstart(fff80a622a8) at wg_qstart+0x18c ifq_serialize(80a622a8,80a62390) at ifq_serialize+0xfd hfsc_deferred(80a62000) at hfsc_deferred+0x68 softclock_process_tick_timeout(8115e248,1) at softclock_process_tick_timeout+0xfb softclock(0) at softclock+0xb8 softintr_dispatch(0) at softintr_dispatch+0xeb end trace frame: 0x800020dbc730, count:0 -- Before enlightenment - chop wood, draw water. After enlightenment - chop wood, draw water. Marko Cupać https://www.mimar.rs/
Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
Please try to re-type at least the most important bits from a screenshot so readers can quickly see which subsystems are involved. splassert: assertwaitok: want 0 have 4 assertion p->wchan == NULL failed, kern_sched.c line 373 active procs: openvpn wg_handshake softnet0 trace includes rw_enter, sleep_finish, noise_remote_ready, wg_qstart, ifq_serialize, hfsc_deferred On 2024/02/28 10:12, Marko Cupać wrote: > Hi, > > I have fairly busy CARP pair which serves as a hub for ~30 spokes which > tunnel to it over combination of GRE/IPSEC, tunnel mode ipsec and > wireguard. It also serves as host-to-site VPN concentrator for npppd, > openvpn and wireguard clients. Besides firewalling, pf also does prio > and queue, as well as pflow. There's also read-only snmpd for querying > interface statistics from another snmpd_exporter / prometheus / grafana > host. > > Yesterday one of the two crashed (7.4, syspatched up to 011_ssh). I have > attached photo (sorry I had no access to serial console). > > I'd be glad to provide more information. > > Below's dmesg. > > OpenBSD 7.4 (GENERIC.MP) #2: Fri Dec 8 15:39:04 MST 2023 > > r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 17027289088 (16238MB) > avail mem = 16491524096 (15727MB) > random: good seed from bootblocks > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x788c5000 (241 entries) > bios0: vendor HP version "P89" date 11/23/2021 > bios0: HP ProLiant DL360 Gen9 > efi0 at bios0: UEFI 2.4 > efi0: HP rev 0x25c00 > acpi0 at bios0: ACPI 5.0 > acpi0: sleep states S0 S5 > acpi0: tables DSDT FACP UEFI MCEJ SSDT HEST BERT ERST EINJ BGRT HPET PMCT > WDDT APIC MCFG SLIT SRAT SPMI RASF SPCR MSCT BDAT PCCT DMAR SSDT SSDT SSDT > acpi0: wakeup devices PEX4(S4) BR05(S4) BR03(S4) BR07(S4) > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpihpet0 at acpi0: 14318179 Hz > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Xeon(R) CPU E5-2623 v4 @ 2.60GHz, 2597.01 MHz, 06-4f-01, patch > 0b40 > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,RDSEED,ADX,SMAP,PT,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB > 64b/line 8-way L2 cache, 10MB 64b/line 20-way L3 cache > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges > cpu0: apic clock running at 99MHz > cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE > cpu1 at mainbus0: apid 2 (application processor) > cpu1: Intel(R) Xeon(R) CPU E5-2623 v4 @ 2.60GHz, 2597.02 MHz, 06-4f-01, patch > 0b40 > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,RDSEED,ADX,SMAP,PT,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN > cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB > 64b/line 8-way L2 cache, 10MB 64b/line 20-way L3 cache > cpu1: smt 0, core 1, package 0 > cpu2 at mainbus0: apid 4 (application processor) > cpu2: Intel(R) Xeon(R) CPU E5-2623 v4 @ 2.60GHz, 2597.07 MHz, 06-4f-01, patch > 0b40 > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,RDSEED,ADX,SMAP,PT,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN > cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB > 64b/line 8-way L2 cache, 10MB 64b/line 20-way L3 cache > cpu2: smt 0, core 2, package 0 > cpu3 at mainbus0: apid 6 (application processor) > cpu3: Intel(R) Xeon(R) CPU E5-2623 v4 @ 2.60GHz, 2597.06 MHz, 06-4f-01, patch > 0b40 > cpu3: >
panic: kernel diagnostic assertion "p->p_wchan == NULL" failed
>Synopsis: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed >Category: kernel >Environment: System : OpenBSD 6.7 Details : OpenBSD 6.7-current (GENERIC.MP) #359: Sun Jul 19 01:36:17 MDT 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: splassert: assertwaitok: want 0 have 4 panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file "/usr/src/sys/kern/kern_sched.c", line 353 Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND db_enter() at db_enter+0x10 panic(81e00828) at panic+0x128 __assert(81e662e1,81df182a,161,81dead80) at __assert+0x2b sched_chooseproc() at sched_chooseproc+0x23c mi_switch() at mi_switch+0x16a sleep_finish(800022520218,1) at sleep_finish+0x84 sleep_finish_all(800022520218,1) at sleep_finish_all+0x21 tsleep(80ac7200,119,81e5f7e9,1) at tsleep+0xd6 ttyretype(80ac7200) at ttyretype+0x2c1 ttyrub(61,80ac7200) at ttyrub+0x296 ttyinput(7f,80ac7200) at ttyinput+0x894 wsdisplay_kbdinput(80aa4800,8200,8214fee0,1) at wsdisplay_kbdinput+0xaa wskbd_repeat(8059f400) at wskbd_repeat+0x54 softclock(0) at softclock+0x143 end trace frame: 0x800022520510, count: 0 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}>PID TID PPIDUID S FLAGS WAIT COMMAND 95235 417386 13622 1000 30x100083 nanosleep sleep 92726 714 1 0 30x100083 ttyin getty 54535 367099 1 0 30x100083 ttyin getty 61298 377358 1 0 30x100083 ttyin getty 50914 25032 1 0 30x100083 ttyin getty 13622 256267 1 1000 30x8b pause zsh 27970 89345 1 0 30x100098 poll cron 27086 124705 1 0 30x80 kqreadapmd 635 283574 1 99 30x100090 poll sndiod 94106 27 1110 30x100090 poll sndiod 15101 427156 50294 95 30x100092 kqreadsmtpd 2622 447026 50294103 30x100092 kqreadsmtpd 8133 120267 50294 95 30x100092 kqreadsmtpd 58744 371200 50294 95 30x100092 kqreadsmtpd 6766 246437 50294 95 30x100092 kqreadsmtpd 73248 319291 50294 95 30x100092 kqreadsmtpd 50294 359602 1 0 30x100080 kqreadsmtpd 69630 152270 1 0 30x80 selectsshd 42837 498062 1 0 30x100080 poll ntpd 93719 309023 88599 83 30x100092 poll ntpd 88599 505189 1 83 30x100092 poll ntpd 46954 486069 88747 74 30x100092 bpf pflogd 88747 191508 1 0 30x80 netio pflogd 27991 489183 76346 73 30x100090 kqreadsyslogd 76346 22579 1 0 30x100082 netio syslogd 3500 500706 1 77 30x100090 poll dhclient 34202 74190 1 0 30x80 poll dhclient 25647 229537 39826115 30x100092 kqreadslaacd 93406 409450 39826115 30x100092 kqreadslaacd 39826 140627 1 0 30x100080 kqreadslaacd 83062 368808 0 0 3 0x14200 bored i915-userptr-acq 36069 209829 0 0 3 0x14200 bored i915_flip 93437 164764 0 0 3 0x14200 bored i915_modeset 15782 454241 0 0 3 0x14200 bored i915-dp 47677 30967 0 0 3 0x14200 bored i915 12555 298661 0 0 3 0x14200 bored smr 99642 122374 0 0 3 0x14200 pgzerozerothread 38899 508278 0 0 3 0x14200 aiodoned aiodoned 83433 375664 0 0 3 0x14200 syncerupdate 57438 226169 0 0 3 0x14200 cleaner cleaner 50849 208387 0 0 3 0x14200 reaperreaper 76534 345574 0 0 3 0x14200 pgdaemon pagedaemon 25908 84585 0 0 3 0x14200 bored srdis 983 490844 0 0