On 22/02/24(Thu) 17:24, Claudio Jeker wrote: > On Thu, Feb 22, 2024 at 04:16:57PM +0100, Martin Pieuchot wrote: > > On 21/02/24(Wed) 13:05, Claudio Jeker wrote: > > > On Tue, Feb 20, 2024 at 09:34:12PM +0100, Martin Pieuchot wrote: > > > > On 28/10/21(Thu) 05:45, Visa Hankala wrote: > > > > > On Wed, Oct 27, 2021 at 09:02:08PM -0400, Dave Voutila wrote: > > > > > > Dave Voutila <d...@sisu.io> writes: > > > > > > > > > > > > > Was tinkering on a bt(5) script for trying to debug an issue in > > > > > > > vmm(4) > > > > > > > when I managed to start hitting a panic "wakeup: p_stat is 2" > > > > > > > being > > > > > > > triggered by kqueue coming from the softnet kernel task. > > > > > > > > > > > > > > I'm running an amd64 kernel built from the tree today (latest CVS > > > > > > > commit > > > > > > > id UynQo1r7kLKA0Q2p) with VMM_DEBUG option set and the defaults > > > > > > > from > > > > > > > GENERIC.MP. Userland is from the latest amd snap. > > > > > > > > > > > > > > To reproduce, I'm running a single OpenBSD-current guest under > > > > > > > vmd(8) > > > > > > > which I'm targeting with the following trivial btrace script I was > > > > > > > working on to use for debugging something in vmm(4): > > > > > > > > > > > > > > tracepoint:sched:sleep / pid == $1 && tid == $2 /{ > > > > > > > printf("pid %d, tid %d slept %d!\n", pid, tid, nsecs); > > > > > > > } > > > > > > > > > > > > > > tracepoint:sched:wakeup / pid == $1 && tid == $2 /{ > > > > > > > printf("pid %d, tid %d awoke %d!\n", pid, tid, nsecs); > > > > > > > } > > > > > > > > > > > > Even easier reproduction: if you have 2 machines and can use > > > > > > tcpbench(1) > > > > > > between them, then while tcpbench is running target it with the > > > > > > above > > > > > > btrace script. I've found running the script, killing it with > > > > > > ctrl-c, > > > > > > and re-running it 2-3 times triggers the panic on my laptop. > > > > > > > > > > > > > > > > > > > > Both times this happened I was trying to sysupgrade the vmd(8) > > > > > > > guest > > > > > > > while running the above btrace script. When I don't run the > > > > > > > script, > > > > > > > there is no panic. > > > > > > > > > > > > > > Image of the full backtrace is here: https://imgur.com/a/swW1qoj > > > > > > > > > > > > > > Simple transcript of the call stack after the panic() call looks > > > > > > > like: > > > > > > > > > > > > > > wakeup_n > > > > > > > kqueue_wakeup > > > > > > > knote > > > > > > > selwakekup > > > > > > > tun_enqueue > > > > > > > ether_output > > > > > > > ip_output > > > > > > > ip_forward > > > > > > > ip_input_if > > > > > > > ipv4_input > > > > > > > ether_input > > > > > > > if_input_process > > > > > > > > > > > > > > The other 3 cpu cores appeared to be in ipi handlers. (Image in > > > > > > > that > > > > > > > imgur link) > > > > > > > > > > I think the problem is recursion within the scheduler. Trace points > > > > > invoke wakeup() directly, which is unsafe when done from within the > > > > > run queue routines. > > > > > > > > > > One approach to avoid the recursion is to defer the wakeup() with > > > > > a soft interrupt. The scheduler runs at an elevated system priority > > > > > level that blocks soft interrupts. The deferred wakeup() is issued > > > > > once > > > > > the system priority level turns low enough. > > > > > > > > > > Unfortunately, the patch will get broken when someone adds trace > > > > > points > > > > > to soft interrupt scheduling... > > > > > > > > Thanks for the report. Sorry for the delay. I'm now really interested > > > > in fixing this bug because I'm heavily using btrace(8) to analyse the > > > > scheduler and I hit this panic a couple of times. > > > > > > > > The problem is that `pnext' might no longer be on the sleepqueue after a > > > > tracepoint inside setrunnable() fired. Diff below fixes that by making > > > > wakeup_n() re-entrant. > > > > > > > > I'm not very interested in committing this diff because it relies on a > > > > recursive SCHED_LOCK(). Instead I'd prefer to split wakeup_n() in two > > > > stages: first unlink the threads then call setrunnable(). This approach > > > > will help us untangle the sleepqueue but needs a bit more shuffling, > > > > like moving unsleep() out of setrunnable()... > > > > > > > > Claudio, Visa do you agree? > > > > > > [...] > > > > > > Ugh, this is too much magic for my little brain. > > > I would prefer to unqueue the procs onto a new local list and have a > > > special wakeup_proc for them. The code around wakeup and unsleep will need > > > some changes to unlock it properly. > > > > Yes, that is what I had in mind. Diff below does that and works for > > me(tm). > > > > The first loop removes threads from the sleep queue and the second put > > them on a runqueue and optionally call tracepoints. > > > > Note that unsleep() has been removed from setrunnable() to reduce the > > complexity of the code. Messing, sometimes, with the state of the > > sleepqueue down in this path doesn't help me get a simpler picture of > > this machinery. > > > > ok? > > I like this a lot. I think this is going in the right direction. > Diff looks good but I only quickly looked at it. > Untangling unsleep() from setrunnable() is a good thing (I hope it works).
I'd like to move forward with this. Any ok? > > Index: dev/pci/drm/drm_linux.c > > =================================================================== > > RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v > > retrieving revision 1.109 > > diff -u -p -r1.109 drm_linux.c > > --- dev/pci/drm/drm_linux.c 21 Jan 2024 13:36:40 -0000 1.109 > > +++ dev/pci/drm/drm_linux.c 22 Feb 2024 14:32:27 -0000 > > @@ -162,7 +162,7 @@ wake_up_process(struct proc *p) > > int s, rv; > > > > SCHED_LOCK(s); > > - rv = wakeup_proc(p, NULL, 0); > > + rv = wakeup_proc(p, 0); > > SCHED_UNLOCK(s); > > return rv; > > } > > Index: kern/kern_sig.c > > =================================================================== > > RCS file: /cvs/src/sys/kern/kern_sig.c,v > > retrieving revision 1.321 > > diff -u -p -r1.321 kern_sig.c > > --- kern/kern_sig.c 17 Jan 2024 22:22:25 -0000 1.321 > > +++ kern/kern_sig.c 22 Feb 2024 14:25:16 -0000 > > @@ -1200,6 +1200,7 @@ runfast: > > if (p->p_usrpri > PUSER) > > p->p_usrpri = PUSER; > > run: > > + unsleep(p); > > setrunnable(p); > > out: > > /* finally adjust siglist */ > > @@ -2104,6 +2105,7 @@ single_thread_set(struct proc *p, int fl > > if (q->p_flag & P_WEXIT) { > > if (mode == SINGLE_EXIT) { > > if (q->p_stat == SSTOP) { > > + unsleep(q); > > setrunnable(q); > > atomic_inc_int(&pr->ps_singlecount); > > } > > @@ -2125,12 +2127,14 @@ single_thread_set(struct proc *p, int fl > > break; > > } > > /* need to unwind or exit, so wake it */ > > + unsleep(q); > > setrunnable(q); > > } > > atomic_inc_int(&pr->ps_singlecount); > > break; > > case SSTOP: > > if (mode == SINGLE_EXIT) { > > + unsleep(q); > > setrunnable(q); > > atomic_inc_int(&pr->ps_singlecount); > > } > > @@ -2198,9 +2202,10 @@ single_thread_clear(struct proc *p, int > > * it back into some sleep queue > > */ > > if (q->p_stat == SSTOP && (q->p_flag & flag) == 0) { > > - if (q->p_wchan == NULL) > > + if (q->p_wchan == NULL) { > > + unsleep(q); > > setrunnable(q); > > - else { > > + } else { > > atomic_clearbits_int(&q->p_flag, P_WSLEEP); > > q->p_stat = SSLEEP; > > } > > Index: kern/kern_synch.c > > =================================================================== > > RCS file: /cvs/src/sys/kern/kern_synch.c,v > > retrieving revision 1.200 > > diff -u -p -r1.200 kern_synch.c > > --- kern/kern_synch.c 13 Sep 2023 14:25:49 -0000 1.200 > > +++ kern/kern_synch.c 22 Feb 2024 14:55:05 -0000 > > @@ -467,25 +467,23 @@ sleep_signal_check(void) > > } > > > > int > > -wakeup_proc(struct proc *p, const volatile void *chan, int flags) > > +wakeup_proc(struct proc *p, int flags) > > { > > int awakened = 0; > > > > SCHED_ASSERT_LOCKED(); > > > > - if (p->p_wchan != NULL && > > - ((chan == NULL) || (p->p_wchan == chan))) { > > + if (p->p_wchan != NULL) { > > awakened = 1; > > if (flags) > > atomic_setbits_int(&p->p_flag, flags); > > - if (p->p_stat == SSLEEP) > > - setrunnable(p); > > - else if (p->p_stat == SSTOP) > > - unsleep(p); > > #ifdef DIAGNOSTIC > > - else > > - panic("wakeup: p_stat is %d", (int)p->p_stat); > > + if (p->p_stat != SSLEEP && p->p_stat != SSTOP) > > + panic("thread %d p_stat is %d", p->p_tid, p->p_stat); > > #endif > > + unsleep(p); > > + if (p->p_stat == SSLEEP) > > + setrunnable(p); > > } > > > > return awakened; > > @@ -505,7 +503,7 @@ endtsleep(void *arg) > > int s; > > > > SCHED_LOCK(s); > > - wakeup_proc(p, NULL, P_TIMEOUT); > > + wakeup_proc(p, P_TIMEOUT); > > SCHED_UNLOCK(s); > > } > > > > @@ -531,21 +529,35 @@ unsleep(struct proc *p) > > void > > wakeup_n(const volatile void *ident, int n) > > { > > - struct slpque *qp; > > + struct slpque *qp, wakeq; > > struct proc *p; > > struct proc *pnext; > > int s; > > > > + TAILQ_INIT(&wakeq); > > + > > SCHED_LOCK(s); > > qp = &slpque[LOOKUP(ident)]; > > for (p = TAILQ_FIRST(qp); p != NULL && n != 0; p = pnext) { > > pnext = TAILQ_NEXT(p, p_runq); > > #ifdef DIAGNOSTIC > > if (p->p_stat != SSLEEP && p->p_stat != SSTOP) > > - panic("wakeup: p_stat is %d", (int)p->p_stat); > > + panic("thread %d p_stat is %d", p->p_tid, p->p_stat); > > #endif > > - if (wakeup_proc(p, ident, 0)) > > + KASSERT(p->p_wchan != NULL); > > + if (p->p_wchan == ident) { > > + TAILQ_REMOVE(qp, p, p_runq); > > + p->p_wchan = NULL; > > + TAILQ_INSERT_TAIL(&wakeq, p, p_runq); > > --n; > > + } > > + } > > + while ((p = TAILQ_FIRST(&wakeq))) { > > + TAILQ_REMOVE(&wakeq, p, p_runq); > > + TRACEPOINT(sched, unsleep, p->p_tid + THREAD_PID_OFFSET, > > + p->p_p->ps_pid); > > + if (p->p_stat == SSLEEP) > > + setrunnable(p); > > } > > SCHED_UNLOCK(s); > > } > > Index: kern/sched_bsd.c > > =================================================================== > > RCS file: /cvs/src/sys/kern/sched_bsd.c,v > > retrieving revision 1.90 > > diff -u -p -r1.90 sched_bsd.c > > --- kern/sched_bsd.c 24 Jan 2024 19:23:38 -0000 1.90 > > +++ kern/sched_bsd.c 22 Feb 2024 14:24:31 -0000 > > @@ -500,12 +500,10 @@ setrunnable(struct proc *p) > > if ((pr->ps_flags & PS_TRACED) != 0 && pr->ps_xsig != 0) > > atomic_setbits_int(&p->p_siglist, sigmask(pr->ps_xsig)); > > prio = p->p_usrpri; > > - unsleep(p); > > setrunqueue(NULL, p, prio); > > break; > > case SSLEEP: > > prio = p->p_slppri; > > - unsleep(p); /* e.g. when sending signals */ > > > > /* if not yet asleep, don't add to runqueue */ > > if (ISSET(p->p_flag, P_WSLEEP)) > > Index: kern/sys_process.c > > =================================================================== > > RCS file: /cvs/src/sys/kern/sys_process.c,v > > retrieving revision 1.95 > > diff -u -p -r1.95 sys_process.c > > --- kern/sys_process.c 21 Nov 2023 14:00:13 -0000 1.95 > > +++ kern/sys_process.c 22 Feb 2024 14:25:34 -0000 > > @@ -493,6 +493,7 @@ ptrace_ctrl(struct proc *p, int req, pid > > if (t->p_stat == SSTOP) { > > tr->ps_xsig = data; > > SCHED_LOCK(s); > > + unsleep(t); > > setrunnable(t); > > SCHED_UNLOCK(s); > > } else { > > Index: sys/proc.h > > =================================================================== > > RCS file: /cvs/src/sys/sys/proc.h,v > > retrieving revision 1.356 > > diff -u -p -r1.356 proc.h > > --- sys/proc.h 3 Feb 2024 18:51:58 -0000 1.356 > > +++ sys/proc.h 22 Feb 2024 14:32:13 -0000 > > @@ -555,7 +555,7 @@ void procinit(void); > > void setpriority(struct proc *, uint32_t, uint8_t); > > void setrunnable(struct proc *); > > void endtsleep(void *); > > -int wakeup_proc(struct proc *, const volatile void *, int); > > +int wakeup_proc(struct proc *, int); > > void unsleep(struct proc *); > > void reaper(void *); > > __dead void exit1(struct proc *, int, int, int); > > -- > :wq Claudio >