On 25/07/24(Thu) 17:33, Claudio Jeker wrote:
> On Thu, Jul 25, 2024 at 05:15:32PM +0200, Martin Pieuchot wrote:
> > On 25/07/24(Thu) 14:51, Claudio Jeker wrote:
> > > On Thu, Jul 25, 2024 at 11:09:44AM +0200, Martin Pieuchot wrote:
> > > [...]
> > > > > Index: kern/kern_synch.c
> > > > > ===================================================================
> > > > > RCS file: /cvs/src/sys/kern/kern_synch.c,v
> > > > > diff -u -p -r1.206 kern_synch.c
> > > > > --- kern/kern_synch.c 23 Jul 2024 08:38:02 -0000 1.206
> > > > > +++ kern/kern_synch.c 24 Jul 2024 14:14:06 -0000
> > > > > @@ -399,15 +399,18 @@ sleep_finish(int timo, int do_sleep)
> > > > > */
> > > > > if (p->p_wchan == NULL)
> > > > > do_sleep = 0;
> > > > > + KASSERT((p->p_flag & P_SINGLESLEEP) == 0);
> > > > > atomic_clearbits_int(&p->p_flag, P_WSLEEP);
> > > > >
> > > > > + /* If requested to stop always force a stop even if do_sleep ==
> > > > > 0 */
> > > > > + if (p->p_stat == SSTOP)
> > > > > + do_sleep = 1;
> > > >
> > > > This is also scary. The problem with the current scheme is that we
> > > > don't
> > > > know who changed `p_stat' and if we already did our context switch or
> > > > not.
> > >
> > > It is scary indeed. I would prefer if sleep_signal_check() would not
> > > randomly mi_switch() away behind our back. I first tried that but it is
> > > harder then you think.
> >
> > So maybe let's start by adding:
> >
> > KASSERT(!(p->p_stat == SSTOP && do_sleep == 0))
> >
> > And see if something blows.
>
> I can assure you single_thread_set() is able to put p->p_stat to SSTOP
> between these lines:
> if ((error = sleep_signal_check(p)) != 0) {
> catch = 0;
> do_sleep = 0;
> }
> }
>
> SCHED_LOCK();
>
> So it can happen but the window is reasonably small (mainly the call to
> cursig() and some minimal other fluff) that the KASSERT will probably
> never hit.
This whole discussion makes me believe that SINGLE_SUSPEND is the
incorrect solution for this and should die.
Instead of trying to change the state of siblings in single_thread_set()
and context switching in single_thread_check() all threads should stop
inside cursig().
I really appreciate all the efforts you've put into debugging this.
However I cannot believe that adding more hacks and checking for p_stat
is the way to go.