On 22/02/24(Thu) 17:24, Claudio Jeker wrote:
> On Thu, Feb 22, 2024 at 04:16:57PM +0100, Martin Pieuchot wrote:
> > On 21/02/24(Wed) 13:05, Claudio Jeker wrote:
> > > On Tue, Feb 20, 2024 at 09:34:12PM +0100, Martin Pieuchot wrote:
> > > > On 28/10/21(Thu) 05:45, Visa Hankala wrote:
> > > > > On Wed, Oct 27, 2021 at 09:02:08PM -0400, Dave Voutila wrote:
> > > > > > Dave Voutila <d...@sisu.io> writes:
> > > > > > 
> > > > > > > Was tinkering on a bt(5) script for trying to debug an issue in 
> > > > > > > vmm(4)
> > > > > > > when I managed to start hitting a panic "wakeup: p_stat is 2" 
> > > > > > > being
> > > > > > > triggered by kqueue coming from the softnet kernel task.
> > > > > > >
> > > > > > > I'm running an amd64 kernel built from the tree today (latest CVS 
> > > > > > > commit
> > > > > > > id UynQo1r7kLKA0Q2p) with VMM_DEBUG option set and the defaults 
> > > > > > > from
> > > > > > > GENERIC.MP. Userland is from the latest amd snap.
> > > > > > >
> > > > > > > To reproduce, I'm running a single OpenBSD-current guest under 
> > > > > > > vmd(8)
> > > > > > > which I'm targeting with the following trivial btrace script I was
> > > > > > > working on to use for debugging something in vmm(4):
> > > > > > >
> > > > > > > tracepoint:sched:sleep / pid == $1 && tid == $2 /{
> > > > > > >   printf("pid %d, tid %d slept %d!\n", pid, tid, nsecs);
> > > > > > > }
> > > > > > >
> > > > > > > tracepoint:sched:wakeup / pid == $1 && tid == $2 /{
> > > > > > >   printf("pid %d, tid %d awoke %d!\n", pid, tid, nsecs);
> > > > > > > }
> > > > > > 
> > > > > > Even easier reproduction: if you have 2 machines and can use 
> > > > > > tcpbench(1)
> > > > > > between them, then while tcpbench is running target it with the 
> > > > > > above
> > > > > > btrace script. I've found running the script, killing it with 
> > > > > > ctrl-c,
> > > > > > and re-running it 2-3 times triggers the panic on my laptop.
> > > > > > 
> > > > > > >
> > > > > > > Both times this happened I was trying to sysupgrade the vmd(8) 
> > > > > > > guest
> > > > > > > while running the above btrace script. When I don't run the 
> > > > > > > script,
> > > > > > > there is no panic.
> > > > > > >
> > > > > > > Image of the full backtrace is here: https://imgur.com/a/swW1qoj
> > > > > > >
> > > > > > > Simple transcript of the call stack after the panic() call looks 
> > > > > > > like:
> > > > > > >
> > > > > > > wakeup_n
> > > > > > > kqueue_wakeup
> > > > > > > knote
> > > > > > > selwakekup
> > > > > > > tun_enqueue
> > > > > > > ether_output
> > > > > > > ip_output
> > > > > > > ip_forward
> > > > > > > ip_input_if
> > > > > > > ipv4_input
> > > > > > > ether_input
> > > > > > > if_input_process
> > > > > > >
> > > > > > > The other 3 cpu cores appeared to be in ipi handlers. (Image in 
> > > > > > > that
> > > > > > > imgur link)
> > > > > 
> > > > > I think the problem is recursion within the scheduler. Trace points
> > > > > invoke wakeup() directly, which is unsafe when done from within the
> > > > > run queue routines.
> > > > > 
> > > > > One approach to avoid the recursion is to defer the wakeup() with
> > > > > a soft interrupt. The scheduler runs at an elevated system priority
> > > > > level that blocks soft interrupts. The deferred wakeup() is issued 
> > > > > once
> > > > > the system priority level turns low enough.
> > > > > 
> > > > > Unfortunately, the patch will get broken when someone adds trace 
> > > > > points
> > > > > to soft interrupt scheduling...
> > > > 
> > > > Thanks for the report.  Sorry for the delay.  I'm now really interested
> > > > in fixing this bug because I'm heavily using btrace(8) to analyse the
> > > > scheduler and I hit this panic a couple of times.
> > > > 
> > > > The problem is that `pnext' might no longer be on the sleepqueue after a
> > > > tracepoint inside setrunnable() fired.  Diff below fixes that by making
> > > > wakeup_n() re-entrant.
> > > > 
> > > > I'm not very interested in committing this diff because it relies on a
> > > > recursive SCHED_LOCK().  Instead I'd prefer to split wakeup_n() in two
> > > > stages: first unlink the threads then call setrunnable().  This approach
> > > > will help us untangle the sleepqueue but needs a bit more shuffling,
> > > > like moving unsleep() out of setrunnable()...
> > > > 
> > > > Claudio, Visa do you agree?
> > >
> > > [...] 
> > >
> > > Ugh, this is too much magic for my little brain.
> > > I would prefer to unqueue the procs onto a new local list and have a
> > > special wakeup_proc for them. The code around wakeup and unsleep will need
> > > some changes to unlock it properly.
> > 
> > Yes, that is what I had in mind.  Diff below does that and works for
> > me(tm).
> > 
> > The first loop removes threads from the sleep queue and the second put
> > them on a runqueue and optionally call tracepoints.
> > 
> > Note that unsleep() has been removed from setrunnable() to reduce the
> > complexity of the code.  Messing, sometimes, with the state of the
> > sleepqueue down in this path doesn't help me get a simpler picture of
> > this machinery.
> > 
> > ok?
> 
> I like this a lot. I think this is going in the right direction.
> Diff looks good but I only quickly looked at it.
> Untangling unsleep() from setrunnable() is a good thing (I hope it works).

I'd like to move forward with this.  Any ok?

> > Index: dev/pci/drm/drm_linux.c
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v
> > retrieving revision 1.109
> > diff -u -p -r1.109 drm_linux.c
> > --- dev/pci/drm/drm_linux.c 21 Jan 2024 13:36:40 -0000      1.109
> > +++ dev/pci/drm/drm_linux.c 22 Feb 2024 14:32:27 -0000
> > @@ -162,7 +162,7 @@ wake_up_process(struct proc *p)
> >     int s, rv;
> >  
> >     SCHED_LOCK(s);
> > -   rv = wakeup_proc(p, NULL, 0);
> > +   rv = wakeup_proc(p, 0);
> >     SCHED_UNLOCK(s);
> >     return rv;
> >  }
> > Index: kern/kern_sig.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_sig.c,v
> > retrieving revision 1.321
> > diff -u -p -r1.321 kern_sig.c
> > --- kern/kern_sig.c 17 Jan 2024 22:22:25 -0000      1.321
> > +++ kern/kern_sig.c 22 Feb 2024 14:25:16 -0000
> > @@ -1200,6 +1200,7 @@ runfast:
> >     if (p->p_usrpri > PUSER)
> >             p->p_usrpri = PUSER;
> >  run:
> > +   unsleep(p);
> >     setrunnable(p);
> >  out:
> >     /* finally adjust siglist */
> > @@ -2104,6 +2105,7 @@ single_thread_set(struct proc *p, int fl
> >             if (q->p_flag & P_WEXIT) {
> >                     if (mode == SINGLE_EXIT) {
> >                             if (q->p_stat == SSTOP) {
> > +                                   unsleep(q);
> >                                     setrunnable(q);
> >                                     atomic_inc_int(&pr->ps_singlecount);
> >                             }
> > @@ -2125,12 +2127,14 @@ single_thread_set(struct proc *p, int fl
> >                                     break;
> >                             }
> >                             /* need to unwind or exit, so wake it */
> > +                           unsleep(q);
> >                             setrunnable(q);
> >                     }
> >                     atomic_inc_int(&pr->ps_singlecount);
> >                     break;
> >             case SSTOP:
> >                     if (mode == SINGLE_EXIT) {
> > +                           unsleep(q);
> >                             setrunnable(q);
> >                             atomic_inc_int(&pr->ps_singlecount);
> >                     }
> > @@ -2198,9 +2202,10 @@ single_thread_clear(struct proc *p, int 
> >              * it back into some sleep queue
> >              */
> >             if (q->p_stat == SSTOP && (q->p_flag & flag) == 0) {
> > -                   if (q->p_wchan == NULL)
> > +                   if (q->p_wchan == NULL) {
> > +                           unsleep(q);
> >                             setrunnable(q);
> > -                   else {
> > +                   } else {
> >                             atomic_clearbits_int(&q->p_flag, P_WSLEEP);
> >                             q->p_stat = SSLEEP;
> >                     }
> > Index: kern/kern_synch.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_synch.c,v
> > retrieving revision 1.200
> > diff -u -p -r1.200 kern_synch.c
> > --- kern/kern_synch.c       13 Sep 2023 14:25:49 -0000      1.200
> > +++ kern/kern_synch.c       22 Feb 2024 14:55:05 -0000
> > @@ -467,25 +467,23 @@ sleep_signal_check(void)
> >  }
> >  
> >  int
> > -wakeup_proc(struct proc *p, const volatile void *chan, int flags)
> > +wakeup_proc(struct proc *p, int flags)
> >  {
> >     int awakened = 0;
> >  
> >     SCHED_ASSERT_LOCKED();
> >  
> > -   if (p->p_wchan != NULL &&
> > -      ((chan == NULL) || (p->p_wchan == chan))) {
> > +   if (p->p_wchan != NULL) {
> >             awakened = 1;
> >             if (flags)
> >                     atomic_setbits_int(&p->p_flag, flags);
> > -           if (p->p_stat == SSLEEP)
> > -                   setrunnable(p);
> > -           else if (p->p_stat == SSTOP)
> > -                   unsleep(p);
> >  #ifdef DIAGNOSTIC
> > -           else
> > -                   panic("wakeup: p_stat is %d", (int)p->p_stat);
> > +           if (p->p_stat != SSLEEP && p->p_stat != SSTOP)
> > +                   panic("thread %d p_stat is %d", p->p_tid, p->p_stat);
> >  #endif
> > +           unsleep(p);
> > +           if (p->p_stat == SSLEEP)
> > +                   setrunnable(p);
> >     }
> >  
> >     return awakened;
> > @@ -505,7 +503,7 @@ endtsleep(void *arg)
> >     int s;
> >  
> >     SCHED_LOCK(s);
> > -   wakeup_proc(p, NULL, P_TIMEOUT);
> > +   wakeup_proc(p, P_TIMEOUT);
> >     SCHED_UNLOCK(s);
> >  }
> >  
> > @@ -531,21 +529,35 @@ unsleep(struct proc *p)
> >  void
> >  wakeup_n(const volatile void *ident, int n)
> >  {
> > -   struct slpque *qp;
> > +   struct slpque *qp, wakeq;
> >     struct proc *p;
> >     struct proc *pnext;
> >     int s;
> >  
> > +   TAILQ_INIT(&wakeq);
> > +
> >     SCHED_LOCK(s);
> >     qp = &slpque[LOOKUP(ident)];
> >     for (p = TAILQ_FIRST(qp); p != NULL && n != 0; p = pnext) {
> >             pnext = TAILQ_NEXT(p, p_runq);
> >  #ifdef DIAGNOSTIC
> >             if (p->p_stat != SSLEEP && p->p_stat != SSTOP)
> > -                   panic("wakeup: p_stat is %d", (int)p->p_stat);
> > +                   panic("thread %d p_stat is %d", p->p_tid, p->p_stat);
> >  #endif
> > -           if (wakeup_proc(p, ident, 0))
> > +           KASSERT(p->p_wchan != NULL);
> > +           if (p->p_wchan == ident) {
> > +                   TAILQ_REMOVE(qp, p, p_runq);
> > +                   p->p_wchan = NULL;
> > +                   TAILQ_INSERT_TAIL(&wakeq, p, p_runq);
> >                     --n;
> > +           }
> > +   }
> > +   while ((p = TAILQ_FIRST(&wakeq))) {
> > +           TAILQ_REMOVE(&wakeq, p, p_runq);
> > +           TRACEPOINT(sched, unsleep, p->p_tid + THREAD_PID_OFFSET,
> > +               p->p_p->ps_pid);
> > +           if (p->p_stat == SSLEEP)
> > +                   setrunnable(p);
> >     }
> >     SCHED_UNLOCK(s);
> >  }
> > Index: kern/sched_bsd.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/sched_bsd.c,v
> > retrieving revision 1.90
> > diff -u -p -r1.90 sched_bsd.c
> > --- kern/sched_bsd.c        24 Jan 2024 19:23:38 -0000      1.90
> > +++ kern/sched_bsd.c        22 Feb 2024 14:24:31 -0000
> > @@ -500,12 +500,10 @@ setrunnable(struct proc *p)
> >             if ((pr->ps_flags & PS_TRACED) != 0 && pr->ps_xsig != 0)
> >                     atomic_setbits_int(&p->p_siglist, sigmask(pr->ps_xsig));
> >             prio = p->p_usrpri;
> > -           unsleep(p);
> >             setrunqueue(NULL, p, prio);
> >             break;
> >     case SSLEEP:
> >             prio = p->p_slppri;
> > -           unsleep(p);             /* e.g. when sending signals */
> >  
> >             /* if not yet asleep, don't add to runqueue */
> >             if (ISSET(p->p_flag, P_WSLEEP))
> > Index: kern/sys_process.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/sys_process.c,v
> > retrieving revision 1.95
> > diff -u -p -r1.95 sys_process.c
> > --- kern/sys_process.c      21 Nov 2023 14:00:13 -0000      1.95
> > +++ kern/sys_process.c      22 Feb 2024 14:25:34 -0000
> > @@ -493,6 +493,7 @@ ptrace_ctrl(struct proc *p, int req, pid
> >             if (t->p_stat == SSTOP) {
> >                     tr->ps_xsig = data;
> >                     SCHED_LOCK(s);
> > +                   unsleep(t);
> >                     setrunnable(t);
> >                     SCHED_UNLOCK(s);
> >             } else {
> > Index: sys/proc.h
> > ===================================================================
> > RCS file: /cvs/src/sys/sys/proc.h,v
> > retrieving revision 1.356
> > diff -u -p -r1.356 proc.h
> > --- sys/proc.h      3 Feb 2024 18:51:58 -0000       1.356
> > +++ sys/proc.h      22 Feb 2024 14:32:13 -0000
> > @@ -555,7 +555,7 @@ void    procinit(void);
> >  void       setpriority(struct proc *, uint32_t, uint8_t);
> >  void       setrunnable(struct proc *);
> >  void       endtsleep(void *);
> > -int        wakeup_proc(struct proc *, const volatile void *, int);
> > +int        wakeup_proc(struct proc *, int);
> >  void       unsleep(struct proc *);
> >  void       reaper(void *);
> >  __dead void exit1(struct proc *, int, int, int);
> 
> -- 
> :wq Claudio
> 

Reply via email to