Re: Fix clearing of sleep timeouts

2022-06-06 Thread Visa Hankala
On Mon, Jun 06, 2022 at 06:47:32AM +1000, David Gwynne wrote:
> On Sun, Jun 05, 2022 at 03:57:39PM +, Visa Hankala wrote:
> > On Sun, Jun 05, 2022 at 12:27:32PM +0200, Martin Pieuchot wrote:
> > > On 05/06/22(Sun) 05:20, Visa Hankala wrote:
> > > > Encountered the following panic:
> > > > 
> > > > panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" 
> > > > failed: file "/usr/src/sys/kern/kern_synch.c", line 373
> > > > Stopped at  db_enter+0x10:  popq%rbp
> > > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > > >  423109  57118 55 0x3  02  link
> > > >  330695  30276 55 0x3  03  link
> > > > * 46366  85501 55  0x1003  0x40804001  link
> > > >  188803  85501 55  0x1003  0x40820000K link
> > > > db_enter() at db_enter+0x10
> > > > panic(81f25d2b) at panic+0xbf
> > > > __assert(81f9a186,81f372c8,175,81f87c6c) at 
> > > > __assert+0x25
> > > > sleep_setup(800022d64bf8,800022d64c98,20,81f66ac6,0) at 
> > > > sleep_setup+0x1d8
> > > > cond_wait(800022d64c98,81f66ac6) at cond_wait+0x46
> > > > timeout_barrier(8000228a28b0) at timeout_barrier+0x109
> > > > timeout_del_barrier(8000228a28b0) at timeout_del_barrier+0xa2
> > > > sleep_finish(800022d64d90,1) at sleep_finish+0x16d
> > > > tsleep(823a5130,120,81f0b730,2) at tsleep+0xb2
> > > > sys_nanosleep(8000228a27f0,800022d64ea0,800022d64ef0) at 
> > > > sys_nanosleep+0x12d
> > > > syscall(800022d64f60) at syscall+0x374
> > > > 
> > > > The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously,
> > > > soft-interrupt-driven timeouts could be deleted synchronously without
> > > > blocking. Now, timeout_del_barrier() can sleep regardless of the type
> > > > of the timeout.
> > > > 
> > > > It looks that with small adjustments timeout_del_barrier() can sleep
> > > > in sleep_finish(). The management of run queues is not affected because
> > > > the timeout clearing happens after it. As timeout_del_barrier() does not
> > > > rely on a timeout or signal catching, there should be no risk of
> > > > unbounded recursion or unwanted signal side effects within the sleep
> > > > machinery. In a way, a sleep with a timeout is higher-level than
> > > > one without.
> > > 
> > > I trust you on the analysis.  However this looks very fragile to me.
> > > 
> > > The use of timeout_del_barrier() which can sleep using the global sleep
> > > queue is worrying me.  
> > 
> > I think the queue handling ends in sleep_finish() when SCHED_LOCK()
> > is released. The timeout clearing is done outside of it.
> 
> That's ok.
> 
> > The extra sleeping point inside sleep_finish() is subtle. It should not
> > matter in typical use. But is it permissible with the API? Also, if
> > timeout_del_barrier() sleeps, the thread's priority can change.
> 
> What other options do we have at this point? Spin? Allocate the timeout
> dynamically so sleep_finish doesn't have to wait for it and let the
> handler clean up? How would you stop the timeout handler waking up the
> thread if it's gone back to sleep again for some other reason?

In principle, each thread could have a sleep serial number. If the
serial number was somehow associated with the timeout, the handler
could bail out if the thread has moved on. However, implementing that
association looks tricky. Dynamic allocation could provide a suitable
context struct, but memory allocation in this code would be awful.
Maybe there should be an extended version of timeout_add() that allows
controlled updating of the timeout argument.. Does not sound appealing.

Spinning might be an option in the future. The kernel lock complicates
things, however. The spinning wait would have to release the kernel lock
to avoid possible deadlocking.

> Sleeping here is the least worst option.

So it seems.

> As for timeout_del_barrier, if prio is a worry we can provide an
> advanced version of it that lets you pass the prio in. I'd also
> like to change timeout_barrier so it queues the barrier task at the
> head of the pending lists rather than at the tail.

I think it is not important to preserve the priority here. Later
scheduling events will override it anyway.

> 
> > Note that sleep_finish() already can take an additional nap when
> > signal catching is enabled.
> > 
> > > > Note that endtsleep() can run and set P_TIMEOUT during
> > > > timeout_del_barrier() when the thread is blocked in cond_wait().
> > > > To avoid unnecessary atomic read-modify-write operations, the clearing
> > > > of P_TIMEOUT could be conditional, but maybe that is an unnecessary
> > > > optimization at this point.
> > > 
> > > I agree this optimization seems unnecessary at the moment.
> > > 
> > > > While it should be possible to make the code use timeout_del() instead
> > > > of timeout_del_barrier(), the outcome might not be outright better. For
> > > > example, sleep_

Re: Fix clearing of sleep timeouts

2022-06-06 Thread Martin Pieuchot
On 06/06/22(Mon) 06:47, David Gwynne wrote:
> On Sun, Jun 05, 2022 at 03:57:39PM +, Visa Hankala wrote:
> > On Sun, Jun 05, 2022 at 12:27:32PM +0200, Martin Pieuchot wrote:
> > > On 05/06/22(Sun) 05:20, Visa Hankala wrote:
> > > > Encountered the following panic:
> > > > 
> > > > panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" 
> > > > failed: file "/usr/src/sys/kern/kern_synch.c", line 373
> > > > Stopped at  db_enter+0x10:  popq%rbp
> > > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > > >  423109  57118 55 0x3  02  link
> > > >  330695  30276 55 0x3  03  link
> > > > * 46366  85501 55  0x1003  0x40804001  link
> > > >  188803  85501 55  0x1003  0x40820000K link
> > > > db_enter() at db_enter+0x10
> > > > panic(81f25d2b) at panic+0xbf
> > > > __assert(81f9a186,81f372c8,175,81f87c6c) at 
> > > > __assert+0x25
> > > > sleep_setup(800022d64bf8,800022d64c98,20,81f66ac6,0) at 
> > > > sleep_setup+0x1d8
> > > > cond_wait(800022d64c98,81f66ac6) at cond_wait+0x46
> > > > timeout_barrier(8000228a28b0) at timeout_barrier+0x109
> > > > timeout_del_barrier(8000228a28b0) at timeout_del_barrier+0xa2
> > > > sleep_finish(800022d64d90,1) at sleep_finish+0x16d
> > > > tsleep(823a5130,120,81f0b730,2) at tsleep+0xb2
> > > > sys_nanosleep(8000228a27f0,800022d64ea0,800022d64ef0) at 
> > > > sys_nanosleep+0x12d
> > > > syscall(800022d64f60) at syscall+0x374
> > > > 
> > > > The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously,
> > > > soft-interrupt-driven timeouts could be deleted synchronously without
> > > > blocking. Now, timeout_del_barrier() can sleep regardless of the type
> > > > of the timeout.
> > > > 
> > > > It looks that with small adjustments timeout_del_barrier() can sleep
> > > > in sleep_finish(). The management of run queues is not affected because
> > > > the timeout clearing happens after it. As timeout_del_barrier() does not
> > > > rely on a timeout or signal catching, there should be no risk of
> > > > unbounded recursion or unwanted signal side effects within the sleep
> > > > machinery. In a way, a sleep with a timeout is higher-level than
> > > > one without.
> > > 
> > > I trust you on the analysis.  However this looks very fragile to me.
> > > 
> > > The use of timeout_del_barrier() which can sleep using the global sleep
> > > queue is worrying me.  
> > 
> > I think the queue handling ends in sleep_finish() when SCHED_LOCK()
> > is released. The timeout clearing is done outside of it.
> 
> That's ok.
> 
> > The extra sleeping point inside sleep_finish() is subtle. It should not
> > matter in typical use. But is it permissible with the API? Also, if
> > timeout_del_barrier() sleeps, the thread's priority can change.
> 
> What other options do we have at this point? Spin? Allocate the timeout
> dynamically so sleep_finish doesn't have to wait for it and let the
> handler clean up? How would you stop the timeout handler waking up the
> thread if it's gone back to sleep again for some other reason?
> 
> Sleeping here is the least worst option.

I agree.  I don't think sleeping is bad here.  My concern is about how
sleeping is implemented.  There's a single API built on top of a single
global data structure which now calls itself recursively.  

I'm not sure how much work it would be to make cond_wait(9) use its own
sleep queue...  This is something independent from this fix though.

> As for timeout_del_barrier, if prio is a worry we can provide an
> advanced version of it that lets you pass the prio in. I'd also
> like to change timeout_barrier so it queues the barrier task at the
> head of the pending lists rather than at the tail.

I doubt prio matter.



Re: Fix clearing of sleep timeouts

2022-06-05 Thread David Gwynne
On Sun, Jun 05, 2022 at 03:57:39PM +, Visa Hankala wrote:
> On Sun, Jun 05, 2022 at 12:27:32PM +0200, Martin Pieuchot wrote:
> > On 05/06/22(Sun) 05:20, Visa Hankala wrote:
> > > Encountered the following panic:
> > > 
> > > panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" failed: 
> > > file "/usr/src/sys/kern/kern_synch.c", line 373
> > > Stopped at  db_enter+0x10:  popq%rbp
> > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > >  423109  57118 55 0x3  02  link
> > >  330695  30276 55 0x3  03  link
> > > * 46366  85501 55  0x1003  0x40804001  link
> > >  188803  85501 55  0x1003  0x40820000K link
> > > db_enter() at db_enter+0x10
> > > panic(81f25d2b) at panic+0xbf
> > > __assert(81f9a186,81f372c8,175,81f87c6c) at 
> > > __assert+0x25
> > > sleep_setup(800022d64bf8,800022d64c98,20,81f66ac6,0) at 
> > > sleep_setup+0x1d8
> > > cond_wait(800022d64c98,81f66ac6) at cond_wait+0x46
> > > timeout_barrier(8000228a28b0) at timeout_barrier+0x109
> > > timeout_del_barrier(8000228a28b0) at timeout_del_barrier+0xa2
> > > sleep_finish(800022d64d90,1) at sleep_finish+0x16d
> > > tsleep(823a5130,120,81f0b730,2) at tsleep+0xb2
> > > sys_nanosleep(8000228a27f0,800022d64ea0,800022d64ef0) at 
> > > sys_nanosleep+0x12d
> > > syscall(800022d64f60) at syscall+0x374
> > > 
> > > The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously,
> > > soft-interrupt-driven timeouts could be deleted synchronously without
> > > blocking. Now, timeout_del_barrier() can sleep regardless of the type
> > > of the timeout.
> > > 
> > > It looks that with small adjustments timeout_del_barrier() can sleep
> > > in sleep_finish(). The management of run queues is not affected because
> > > the timeout clearing happens after it. As timeout_del_barrier() does not
> > > rely on a timeout or signal catching, there should be no risk of
> > > unbounded recursion or unwanted signal side effects within the sleep
> > > machinery. In a way, a sleep with a timeout is higher-level than
> > > one without.
> > 
> > I trust you on the analysis.  However this looks very fragile to me.
> > 
> > The use of timeout_del_barrier() which can sleep using the global sleep
> > queue is worrying me.  
> 
> I think the queue handling ends in sleep_finish() when SCHED_LOCK()
> is released. The timeout clearing is done outside of it.

That's ok.

> The extra sleeping point inside sleep_finish() is subtle. It should not
> matter in typical use. But is it permissible with the API? Also, if
> timeout_del_barrier() sleeps, the thread's priority can change.

What other options do we have at this point? Spin? Allocate the timeout
dynamically so sleep_finish doesn't have to wait for it and let the
handler clean up? How would you stop the timeout handler waking up the
thread if it's gone back to sleep again for some other reason?

Sleeping here is the least worst option.

As for timeout_del_barrier, if prio is a worry we can provide an
advanced version of it that lets you pass the prio in. I'd also
like to change timeout_barrier so it queues the barrier task at the
head of the pending lists rather than at the tail.

> Note that sleep_finish() already can take an additional nap when
> signal catching is enabled.
> 
> > > Note that endtsleep() can run and set P_TIMEOUT during
> > > timeout_del_barrier() when the thread is blocked in cond_wait().
> > > To avoid unnecessary atomic read-modify-write operations, the clearing
> > > of P_TIMEOUT could be conditional, but maybe that is an unnecessary
> > > optimization at this point.
> > 
> > I agree this optimization seems unnecessary at the moment.
> > 
> > > While it should be possible to make the code use timeout_del() instead
> > > of timeout_del_barrier(), the outcome might not be outright better. For
> > > example, sleep_setup() and endtsleep() would have to coordinate so that
> > > a late-running timeout from previous sleep cycle would not disturb the
> > > new cycle.
> > 
> > So that's the price for not having to sleep in sleep_finish(), right?
> 
> That is correct. Some synchronization is needed in any case.
> 
> > > To test the barrier path reliably, I made the code call
> > > timeout_del_barrier() twice in a row. The second call is guaranteed
> > > to sleep. Of course, this is not part of the patch.
> > 
> > ok mpi@
> > 
> > > Index: kern/kern_synch.c
> > > ===
> > > RCS file: src/sys/kern/kern_synch.c,v
> > > retrieving revision 1.187
> > > diff -u -p -r1.187 kern_synch.c
> > > --- kern/kern_synch.c 13 May 2022 15:32:00 -  1.187
> > > +++ kern/kern_synch.c 5 Jun 2022 05:04:45 -
> > > @@ -370,8 +370,8 @@ sleep_setup(struct sleep_state *sls, con
> > >   p->p_slppri = prio & PRIMASK;
> > >   TAILQ_INSERT_TAIL(&slpque[LOOK

Re: Fix clearing of sleep timeouts

2022-06-05 Thread Visa Hankala
On Sun, Jun 05, 2022 at 12:27:32PM +0200, Martin Pieuchot wrote:
> On 05/06/22(Sun) 05:20, Visa Hankala wrote:
> > Encountered the following panic:
> > 
> > panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" failed: 
> > file "/usr/src/sys/kern/kern_synch.c", line 373
> > Stopped at  db_enter+0x10:  popq%rbp
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> >  423109  57118 55 0x3  02  link
> >  330695  30276 55 0x3  03  link
> > * 46366  85501 55  0x1003  0x40804001  link
> >  188803  85501 55  0x1003  0x40820000K link
> > db_enter() at db_enter+0x10
> > panic(81f25d2b) at panic+0xbf
> > __assert(81f9a186,81f372c8,175,81f87c6c) at 
> > __assert+0x25
> > sleep_setup(800022d64bf8,800022d64c98,20,81f66ac6,0) at 
> > sleep_setup+0x1d8
> > cond_wait(800022d64c98,81f66ac6) at cond_wait+0x46
> > timeout_barrier(8000228a28b0) at timeout_barrier+0x109
> > timeout_del_barrier(8000228a28b0) at timeout_del_barrier+0xa2
> > sleep_finish(800022d64d90,1) at sleep_finish+0x16d
> > tsleep(823a5130,120,81f0b730,2) at tsleep+0xb2
> > sys_nanosleep(8000228a27f0,800022d64ea0,800022d64ef0) at 
> > sys_nanosleep+0x12d
> > syscall(800022d64f60) at syscall+0x374
> > 
> > The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously,
> > soft-interrupt-driven timeouts could be deleted synchronously without
> > blocking. Now, timeout_del_barrier() can sleep regardless of the type
> > of the timeout.
> > 
> > It looks that with small adjustments timeout_del_barrier() can sleep
> > in sleep_finish(). The management of run queues is not affected because
> > the timeout clearing happens after it. As timeout_del_barrier() does not
> > rely on a timeout or signal catching, there should be no risk of
> > unbounded recursion or unwanted signal side effects within the sleep
> > machinery. In a way, a sleep with a timeout is higher-level than
> > one without.
> 
> I trust you on the analysis.  However this looks very fragile to me.
> 
> The use of timeout_del_barrier() which can sleep using the global sleep
> queue is worrying me.  

I think the queue handling ends in sleep_finish() when SCHED_LOCK()
is released. The timeout clearing is done outside of it.

The extra sleeping point inside sleep_finish() is subtle. It should not
matter in typical use. But is it permissible with the API? Also, if
timeout_del_barrier() sleeps, the thread's priority can change.

Note that sleep_finish() already can take an additional nap when
signal catching is enabled.

> > Note that endtsleep() can run and set P_TIMEOUT during
> > timeout_del_barrier() when the thread is blocked in cond_wait().
> > To avoid unnecessary atomic read-modify-write operations, the clearing
> > of P_TIMEOUT could be conditional, but maybe that is an unnecessary
> > optimization at this point.
> 
> I agree this optimization seems unnecessary at the moment.
> 
> > While it should be possible to make the code use timeout_del() instead
> > of timeout_del_barrier(), the outcome might not be outright better. For
> > example, sleep_setup() and endtsleep() would have to coordinate so that
> > a late-running timeout from previous sleep cycle would not disturb the
> > new cycle.
> 
> So that's the price for not having to sleep in sleep_finish(), right?

That is correct. Some synchronization is needed in any case.

> > To test the barrier path reliably, I made the code call
> > timeout_del_barrier() twice in a row. The second call is guaranteed
> > to sleep. Of course, this is not part of the patch.
> 
> ok mpi@
> 
> > Index: kern/kern_synch.c
> > ===
> > RCS file: src/sys/kern/kern_synch.c,v
> > retrieving revision 1.187
> > diff -u -p -r1.187 kern_synch.c
> > --- kern/kern_synch.c   13 May 2022 15:32:00 -  1.187
> > +++ kern/kern_synch.c   5 Jun 2022 05:04:45 -
> > @@ -370,8 +370,8 @@ sleep_setup(struct sleep_state *sls, con
> > p->p_slppri = prio & PRIMASK;
> > TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
> >  
> > -   KASSERT((p->p_flag & P_TIMEOUT) == 0);
> > if (timo) {
> > +   KASSERT((p->p_flag & P_TIMEOUT) == 0);
> > sls->sls_timeout = 1;
> > timeout_add(&p->p_sleep_to, timo);
> > }
> > @@ -432,13 +432,12 @@ sleep_finish(struct sleep_state *sls, in
> >  
> > if (sls->sls_timeout) {
> > if (p->p_flag & P_TIMEOUT) {
> > -   atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
> > error1 = EWOULDBLOCK;
> > } else {
> > -   /* This must not sleep. */
> > +   /* This can sleep. It must not use timeouts. */
> > timeout_del_barrier(&p->p_sleep_to);
> > -   KASSERT((p->p_flag & P_TIMEOUT) == 0);
> > }

Re: Fix clearing of sleep timeouts

2022-06-05 Thread Martin Pieuchot
On 05/06/22(Sun) 05:20, Visa Hankala wrote:
> Encountered the following panic:
> 
> panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" failed: 
> file "/usr/src/sys/kern/kern_synch.c", line 373
> Stopped at  db_enter+0x10:  popq%rbp
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>  423109  57118 55 0x3  02  link
>  330695  30276 55 0x3  03  link
> * 46366  85501 55  0x1003  0x40804001  link
>  188803  85501 55  0x1003  0x40820000K link
> db_enter() at db_enter+0x10
> panic(81f25d2b) at panic+0xbf
> __assert(81f9a186,81f372c8,175,81f87c6c) at 
> __assert+0x25
> sleep_setup(800022d64bf8,800022d64c98,20,81f66ac6,0) at 
> sleep_setup+0x1d8
> cond_wait(800022d64c98,81f66ac6) at cond_wait+0x46
> timeout_barrier(8000228a28b0) at timeout_barrier+0x109
> timeout_del_barrier(8000228a28b0) at timeout_del_barrier+0xa2
> sleep_finish(800022d64d90,1) at sleep_finish+0x16d
> tsleep(823a5130,120,81f0b730,2) at tsleep+0xb2
> sys_nanosleep(8000228a27f0,800022d64ea0,800022d64ef0) at 
> sys_nanosleep+0x12d
> syscall(800022d64f60) at syscall+0x374
> 
> The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously,
> soft-interrupt-driven timeouts could be deleted synchronously without
> blocking. Now, timeout_del_barrier() can sleep regardless of the type
> of the timeout.
> 
> It looks that with small adjustments timeout_del_barrier() can sleep
> in sleep_finish(). The management of run queues is not affected because
> the timeout clearing happens after it. As timeout_del_barrier() does not
> rely on a timeout or signal catching, there should be no risk of
> unbounded recursion or unwanted signal side effects within the sleep
> machinery. In a way, a sleep with a timeout is higher-level than
> one without.

I trust you on the analysis.  However this looks very fragile to me.

The use of timeout_del_barrier() which can sleep using the global sleep
queue is worrying me.  

> Note that endtsleep() can run and set P_TIMEOUT during
> timeout_del_barrier() when the thread is blocked in cond_wait().
> To avoid unnecessary atomic read-modify-write operations, the clearing
> of P_TIMEOUT could be conditional, but maybe that is an unnecessary
> optimization at this point.

I agree this optimization seems unnecessary at the moment.

> While it should be possible to make the code use timeout_del() instead
> of timeout_del_barrier(), the outcome might not be outright better. For
> example, sleep_setup() and endtsleep() would have to coordinate so that
> a late-running timeout from previous sleep cycle would not disturb the
> new cycle.

So that's the price for not having to sleep in sleep_finish(), right?

> To test the barrier path reliably, I made the code call
> timeout_del_barrier() twice in a row. The second call is guaranteed
> to sleep. Of course, this is not part of the patch.

ok mpi@

> Index: kern/kern_synch.c
> ===
> RCS file: src/sys/kern/kern_synch.c,v
> retrieving revision 1.187
> diff -u -p -r1.187 kern_synch.c
> --- kern/kern_synch.c 13 May 2022 15:32:00 -  1.187
> +++ kern/kern_synch.c 5 Jun 2022 05:04:45 -
> @@ -370,8 +370,8 @@ sleep_setup(struct sleep_state *sls, con
>   p->p_slppri = prio & PRIMASK;
>   TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
>  
> - KASSERT((p->p_flag & P_TIMEOUT) == 0);
>   if (timo) {
> + KASSERT((p->p_flag & P_TIMEOUT) == 0);
>   sls->sls_timeout = 1;
>   timeout_add(&p->p_sleep_to, timo);
>   }
> @@ -432,13 +432,12 @@ sleep_finish(struct sleep_state *sls, in
>  
>   if (sls->sls_timeout) {
>   if (p->p_flag & P_TIMEOUT) {
> - atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
>   error1 = EWOULDBLOCK;
>   } else {
> - /* This must not sleep. */
> + /* This can sleep. It must not use timeouts. */
>   timeout_del_barrier(&p->p_sleep_to);
> - KASSERT((p->p_flag & P_TIMEOUT) == 0);
>   }
> + atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
>   }
>  
>   /* Check if thread was woken up because of a unwind or signal */
> 



Fix clearing of sleep timeouts

2022-06-04 Thread Visa Hankala
Encountered the following panic:

panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" failed: file 
"/usr/src/sys/kern/kern_synch.c", line 373
Stopped at  db_enter+0x10:  popq%rbp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 423109  57118 55 0x3  02  link
 330695  30276 55 0x3  03  link
* 46366  85501 55  0x1003  0x40804001  link
 188803  85501 55  0x1003  0x40820000K link
db_enter() at db_enter+0x10
panic(81f25d2b) at panic+0xbf
__assert(81f9a186,81f372c8,175,81f87c6c) at 
__assert+0x25
sleep_setup(800022d64bf8,800022d64c98,20,81f66ac6,0) at 
sleep_setup+0x1d8
cond_wait(800022d64c98,81f66ac6) at cond_wait+0x46
timeout_barrier(8000228a28b0) at timeout_barrier+0x109
timeout_del_barrier(8000228a28b0) at timeout_del_barrier+0xa2
sleep_finish(800022d64d90,1) at sleep_finish+0x16d
tsleep(823a5130,120,81f0b730,2) at tsleep+0xb2
sys_nanosleep(8000228a27f0,800022d64ea0,800022d64ef0) at 
sys_nanosleep+0x12d
syscall(800022d64f60) at syscall+0x374

The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously,
soft-interrupt-driven timeouts could be deleted synchronously without
blocking. Now, timeout_del_barrier() can sleep regardless of the type
of the timeout.

It looks that with small adjustments timeout_del_barrier() can sleep
in sleep_finish(). The management of run queues is not affected because
the timeout clearing happens after it. As timeout_del_barrier() does not
rely on a timeout or signal catching, there should be no risk of
unbounded recursion or unwanted signal side effects within the sleep
machinery. In a way, a sleep with a timeout is higher-level than
one without.

Note that endtsleep() can run and set P_TIMEOUT during
timeout_del_barrier() when the thread is blocked in cond_wait().
To avoid unnecessary atomic read-modify-write operations, the clearing
of P_TIMEOUT could be conditional, but maybe that is an unnecessary
optimization at this point.

While it should be possible to make the code use timeout_del() instead
of timeout_del_barrier(), the outcome might not be outright better. For
example, sleep_setup() and endtsleep() would have to coordinate so that
a late-running timeout from previous sleep cycle would not disturb the
new cycle.

To test the barrier path reliably, I made the code call
timeout_del_barrier() twice in a row. The second call is guaranteed
to sleep. Of course, this is not part of the patch.

OK?

Index: kern/kern_synch.c
===
RCS file: src/sys/kern/kern_synch.c,v
retrieving revision 1.187
diff -u -p -r1.187 kern_synch.c
--- kern/kern_synch.c   13 May 2022 15:32:00 -  1.187
+++ kern/kern_synch.c   5 Jun 2022 05:04:45 -
@@ -370,8 +370,8 @@ sleep_setup(struct sleep_state *sls, con
p->p_slppri = prio & PRIMASK;
TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
 
-   KASSERT((p->p_flag & P_TIMEOUT) == 0);
if (timo) {
+   KASSERT((p->p_flag & P_TIMEOUT) == 0);
sls->sls_timeout = 1;
timeout_add(&p->p_sleep_to, timo);
}
@@ -432,13 +432,12 @@ sleep_finish(struct sleep_state *sls, in
 
if (sls->sls_timeout) {
if (p->p_flag & P_TIMEOUT) {
-   atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
error1 = EWOULDBLOCK;
} else {
-   /* This must not sleep. */
+   /* This can sleep. It must not use timeouts. */
timeout_del_barrier(&p->p_sleep_to);
-   KASSERT((p->p_flag & P_TIMEOUT) == 0);
}
+   atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
}
 
/* Check if thread was woken up because of a unwind or signal */