Re: timeout_del_barrier(9): remove kernel lock

2021-05-10 Thread David Gwynne
On Thu, May 06, 2021 at 11:43:55AM -0500, Scott Cheloha wrote:
> On Wed, May 05, 2021 at 11:05:06AM +1000, David Gwynne wrote:
> > On Tue, May 04, 2021 at 11:54:31AM -0500, Scott Cheloha wrote:
> > > 
> > > [...]
> > > 
> > > Here is where I get confused.
> > > 
> > > Why do I want to wait for *all* timeouts in the queue to finish
> > > running?
> > 
> > You don't, you just want to know that whatever timeout was running
> > has finished cos you don't know if that currently running timeout is
> > yours or not.
> > 
> > > My understanding of the timeout_del_barrier(9) use case was as
> > > follows:
> > > 
> > > 1. I have a dynamically allocated timeout struct.  The timeout is
> > >scheduled to execute at some point in the future.
> > > 
> > > 2. I want to free the memory allocated to for the timeout.
> > > 
> > > 3. To safely free the memory I need to ensure the timeout
> > >is not executing.  Until the timeout function, i.e.
> > >to->to_func(), has returned it isn't necessarily safe to
> > >free that memory.
> > > 
> > > 4. If I call timeout_del_barrier(9), it will only return if the
> > >timeout in question is not running.  Assuming the timeout itself
> > >is not a periodic timeout that reschedules itself we can then
> > >safely free the memory.
> > 
> > Barriers aren't about references to timeouts, they're about references
> > to the work associated with a timeout.
> > 
> > If you only cared about the timeout struct itself, then you can get
> > all the information about whether it's referenced by the timeout
> > queues from the return values from timeout_add and timeout_del.
> > timeout_add() returns 1 if the timeout subsystem takes the reference
> > to the timeout. If the timeout is already scheduled it returns 0
> > because it's already got a reference. timeout_del returns 1 if the
> > timeout itself was scheduled and it removed it, therefore giving
> > up it's reference. If you timeout_del and it returns 0, then it
> > wasn't on a queue inside timeouts. Easy.
> > 
> > What timeout_add and timeout_del don't tell you is whether the work
> > referenced by the timeout is currently running. The timeout runners
> > copy the function and argument onto the stack when they dequeue a
> > timeout to run, and give up the reference to the timeout when the
> > mutex around the timeout wheels/cirqs is released. However, the argument
> > is still referenced on the stack and the function to process it may be
> > about to run or is running. You can't tell that from the timeout_add/del
> > return values though.
> > 
> > We provide two ways to deal with that. One is you have reference
> > counters (or similar) on the thing you're deferring to a timeout.
> > The other is you use a barrier so you know the work you deferred
> > isn't on the timeout runners stack anymore because it's moved on
> > to run the barrier work.
> > 
> > This is consistent with tasks and interrupts.
> > 
> > > Given this understanding, my approach was to focus on the timeout in
> > > question.  So my code just spins until it the timeout is no longer
> > > running.
> > > 
> > > But now I think I am mistaken.
> > > 
> > > IIUC you're telling me (and showing me, in code) that the goal of
> > > timeout_barrier() is to wait for the *whole* softclock() to return,
> > > not just the one timeout.
> > 
> > No, just the currently running timeout.
> > 
> > Putting the barrier timeout on the end of the proc and todo cirqs
> > is because there's no CIRCQ_INSERT_HEAD I can use right now.
> > 
> > > Why do I want to wait for the whole softclock() or softclock_thread()?
> > > Why not just wait for the one timeout to finish?
> > 
> > Cos I was too lazy to write CIRCQ_INSERT_HEAD last night :D
> 
> Okay.  After thinking this over I'm pretty sure we are skinning the
> same cat in two different ways.
> 
> Your cond_wait(9) approach is fine for both timeout types because
> timeout_del_barrier(9) is only safe to call from process context.
> When I wrote my first diff I was under the impression it was safe to
> call timeout_del_barrier(9) from interrupt context.  But I reread the
> manpage and now I see that this is not the case, my bad.
> 
> > [...]
> > 
> > This discussion has some relevance to taskqs too. I was also lazy when I
> > implemented taskq_barrier and used task_add to put the barrier onto the
> > list of work, which meant it got added to the end. Now DRM relies on
> > this behaviour. Maybe I should have pushed to commit the diff below.
> > 
> > This diff lets taskq barriers run as soon as possible, but adds compat
> > to the drm workqueue stuff so they can wait for all pending work to
> > finish as they expect.
> > 
> > P.S. don't call timeout_del from inside a timeout handler. I still think
> > timeout_set_proc was a mistake.
> > 
> > [...]
> 
> Let's pick this whole train of thought (taskq, etc.) up in a
> different thread.  One thing at a time.
> 
> --
> 
> Updated patch:
> 
> - Keep timeout_barrier(9).
> 
> - In 

Re: timeout_del_barrier(9): remove kernel lock

2021-05-06 Thread Scott Cheloha
On Wed, May 05, 2021 at 11:05:06AM +1000, David Gwynne wrote:
> On Tue, May 04, 2021 at 11:54:31AM -0500, Scott Cheloha wrote:
> > 
> > [...]
> > 
> > Here is where I get confused.
> > 
> > Why do I want to wait for *all* timeouts in the queue to finish
> > running?
> 
> You don't, you just want to know that whatever timeout was running
> has finished cos you don't know if that currently running timeout is
> yours or not.
> 
> > My understanding of the timeout_del_barrier(9) use case was as
> > follows:
> > 
> > 1. I have a dynamically allocated timeout struct.  The timeout is
> >scheduled to execute at some point in the future.
> > 
> > 2. I want to free the memory allocated to for the timeout.
> > 
> > 3. To safely free the memory I need to ensure the timeout
> >is not executing.  Until the timeout function, i.e.
> >to->to_func(), has returned it isn't necessarily safe to
> >free that memory.
> > 
> > 4. If I call timeout_del_barrier(9), it will only return if the
> >timeout in question is not running.  Assuming the timeout itself
> >is not a periodic timeout that reschedules itself we can then
> >safely free the memory.
> 
> Barriers aren't about references to timeouts, they're about references
> to the work associated with a timeout.
> 
> If you only cared about the timeout struct itself, then you can get
> all the information about whether it's referenced by the timeout
> queues from the return values from timeout_add and timeout_del.
> timeout_add() returns 1 if the timeout subsystem takes the reference
> to the timeout. If the timeout is already scheduled it returns 0
> because it's already got a reference. timeout_del returns 1 if the
> timeout itself was scheduled and it removed it, therefore giving
> up it's reference. If you timeout_del and it returns 0, then it
> wasn't on a queue inside timeouts. Easy.
> 
> What timeout_add and timeout_del don't tell you is whether the work
> referenced by the timeout is currently running. The timeout runners
> copy the function and argument onto the stack when they dequeue a
> timeout to run, and give up the reference to the timeout when the
> mutex around the timeout wheels/cirqs is released. However, the argument
> is still referenced on the stack and the function to process it may be
> about to run or is running. You can't tell that from the timeout_add/del
> return values though.
> 
> We provide two ways to deal with that. One is you have reference
> counters (or similar) on the thing you're deferring to a timeout.
> The other is you use a barrier so you know the work you deferred
> isn't on the timeout runners stack anymore because it's moved on
> to run the barrier work.
> 
> This is consistent with tasks and interrupts.
> 
> > Given this understanding, my approach was to focus on the timeout in
> > question.  So my code just spins until it the timeout is no longer
> > running.
> > 
> > But now I think I am mistaken.
> > 
> > IIUC you're telling me (and showing me, in code) that the goal of
> > timeout_barrier() is to wait for the *whole* softclock() to return,
> > not just the one timeout.
> 
> No, just the currently running timeout.
> 
> Putting the barrier timeout on the end of the proc and todo cirqs
> is because there's no CIRCQ_INSERT_HEAD I can use right now.
> 
> > Why do I want to wait for the whole softclock() or softclock_thread()?
> > Why not just wait for the one timeout to finish?
> 
> Cos I was too lazy to write CIRCQ_INSERT_HEAD last night :D

Okay.  After thinking this over I'm pretty sure we are skinning the
same cat in two different ways.

Your cond_wait(9) approach is fine for both timeout types because
timeout_del_barrier(9) is only safe to call from process context.
When I wrote my first diff I was under the impression it was safe to
call timeout_del_barrier(9) from interrupt context.  But I reread the
manpage and now I see that this is not the case, my bad.

> [...]
> 
> This discussion has some relevance to taskqs too. I was also lazy when I
> implemented taskq_barrier and used task_add to put the barrier onto the
> list of work, which meant it got added to the end. Now DRM relies on
> this behaviour. Maybe I should have pushed to commit the diff below.
> 
> This diff lets taskq barriers run as soon as possible, but adds compat
> to the drm workqueue stuff so they can wait for all pending work to
> finish as they expect.
> 
> P.S. don't call timeout_del from inside a timeout handler. I still think
> timeout_set_proc was a mistake.
> 
> [...]

Let's pick this whole train of thought (taskq, etc.) up in a
different thread.  One thing at a time.

--

Updated patch:

- Keep timeout_barrier(9).

- In timeout_barrier(9) use the barrier timeout for both normal and
  process-context timeouts.  All callers use cond_wait(9) now.

- Remove the kernel lock from the non-process path.  I assume I don't
  need to splx(splschedclock()) anymore?

- Rename "timeout_proc_barrier()" to "timeout_barrier_timeout()"
  

Re: timeout_del_barrier(9): remove kernel lock

2021-05-04 Thread David Gwynne
On Tue, May 04, 2021 at 11:54:31AM -0500, Scott Cheloha wrote:
> On Tue, May 04, 2021 at 11:21:27PM +1000, David Gwynne wrote:
> > On Tue, May 04, 2021 at 11:24:05AM +0200, Martin Pieuchot wrote:
> > > On 04/05/21(Tue) 01:10, Scott Cheloha wrote:
> > > > [...] 
> > > > I want to run softclock() without the kernel lock.  The way to go, I
> > > > think, is to first push the kernel lock down into timeout_run(), and
> > > > then to remove the kernel lock from each timeout, one by one.
> > > 
> > > Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates
> > > more latency than running all timeouts in a batch after having grabbed
> > > the KERNEL_LOCK().  I doubt this is the best way forward.
> > > 
> > > > Before we can push the kernel lock down into timeout_run() we need to
> > > > remove the kernel lock from timeout_del_barrier(9).
> > > 
> > > Seems worth it on its own.
> > >
> > > > The kernel lock is used in timeout_del_barrier(9) to determine whether
> > > > the given timeout has stopped running.  Because the softclock() runs
> > > > with the kernel lock we currently assume that once the calling thread
> > > > has taken the kernel lock any onging softclock() must have returned
> > > > and relinquished the lock, so the timeout in question has returned.
> > 
> > as i'll try to explain below, it's not about waiting for a specific
> > timeout, it's about knowing that a currently running timeout has
> > finished.
> 
> I'm very confused about the distinction between the two.

It is subtle :(

> > > So you want to stop using the KERNEL_LOCK() to do the serialization?  
> > 
> > the KERNEL_LOCK in timeout_barrier cos it was an elegant^Wclever^W hack.
> > because timeouts are run under the kernel lock, i knew i could take the
> > lock and know that timeouts arent running anymore. that's all.
> > 
> > in hindsight this means that the thread calling timeout_barrier spins
> > when it could sleep, and worse, it spins waiting for all pending
> > timeouts to run. so yeah, i agree that undoing the hack is worth it on
> > its own.
> 
> Okay so we're all in agreement that this could be improved, cool.

Yep.

> > > > The simplest replacement I can think of is a volatile pointer to the
> > > > running timeout that we set before leaving the timeout_mutex and clear
> > > > after reentering the same during timeout_run().
> > > 
> > > Sounds like a condition variable protected by this mutex.  Interesting
> > > that cond_wait(9) doesn't work with a mutex. 
> > 
> > cond_wait borrows the sched lock to coordinate between the waiting
> > thread and the signalling context. the cond api is basically a wrapper
> > around sleep_setup/sleep_finish.
> > 
> > > > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just
> > > > spins until the timeout function returns and the timeout_running
> > > > pointer is changed.  Not every caller can sleep during
> > > > timeout_del_barrier(9).  I think spinning is the simplest thing that
> > > > will definitely work here.
> > > 
> > > This keeps the current semantic indeed.
> > 
> > i don't want to throw timeout_barrier out just yet.
> 
> I'm fine with that... where might it be useful?

A single timeout_barrier call can be used as a barrier for multiple
timeouts. This would feel more natural if timeout_barrier took a flag
instead of a timeout pointer, or if we didnt have the timeout_proc
context.

> > > > -void
> > > > -timeout_barrier(struct timeout *to)
> > > > +int
> > > > +timeout_del_barrier(struct timeout *to)
> > > >  {
> > > > +   struct timeout barrier;
> > > > +   struct cond c = COND_INITIALIZER();
> > > > int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
> > > >  
> > > > timeout_sync_order(needsproc);
> > > >  
> > > > -   if (!needsproc) {
> > > > -   KERNEL_LOCK();
> > > > -   splx(splsoftclock());
> > > > -   KERNEL_UNLOCK();
> > > > -   } else {
> > > > -   struct cond c = COND_INITIALIZER();
> > > > -   struct timeout barrier;
> > > > +   mtx_enter(_mutex);
> > > > +
> > > > +   if (timeout_del_locked(to)) {
> > > > +   mtx_leave(_mutex);
> > > > +   return 1;
> > > > +   }
> > > >  
> > > > +   if (needsproc) {
> > > > timeout_set_proc(, timeout_proc_barrier, );
> > > > barrier.to_process = curproc->p_p;
> > > > -
> > > > -   mtx_enter(_mutex);
> > > > SET(barrier.to_flags, TIMEOUT_ONQUEUE);
> > > > CIRCQ_INSERT_TAIL(_proc, _list);
> > > > mtx_leave(_mutex);
> > > > -
> > > > wakeup_one(_proc);
> > > > -
> > > > cond_wait(, "tmobar");
> > > > +   } else {
> > > > +   mtx_leave(_mutex);
> > > > +   /* XXX Is this in the right spot? */
> > > > +   splx(splsoftclock());
> > > > +   while (timeout_running == to)
> > > > +   

Re: timeout_del_barrier(9): remove kernel lock

2021-05-04 Thread Scott Cheloha
On Tue, May 04, 2021 at 11:21:27PM +1000, David Gwynne wrote:
> On Tue, May 04, 2021 at 11:24:05AM +0200, Martin Pieuchot wrote:
> > On 04/05/21(Tue) 01:10, Scott Cheloha wrote:
> > > [...] 
> > > I want to run softclock() without the kernel lock.  The way to go, I
> > > think, is to first push the kernel lock down into timeout_run(), and
> > > then to remove the kernel lock from each timeout, one by one.
> > 
> > Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates
> > more latency than running all timeouts in a batch after having grabbed
> > the KERNEL_LOCK().  I doubt this is the best way forward.
> > 
> > > Before we can push the kernel lock down into timeout_run() we need to
> > > remove the kernel lock from timeout_del_barrier(9).
> > 
> > Seems worth it on its own.
> >
> > > The kernel lock is used in timeout_del_barrier(9) to determine whether
> > > the given timeout has stopped running.  Because the softclock() runs
> > > with the kernel lock we currently assume that once the calling thread
> > > has taken the kernel lock any onging softclock() must have returned
> > > and relinquished the lock, so the timeout in question has returned.
> 
> as i'll try to explain below, it's not about waiting for a specific
> timeout, it's about knowing that a currently running timeout has
> finished.

I'm very confused about the distinction between the two.

> > So you want to stop using the KERNEL_LOCK() to do the serialization?  
> 
> the KERNEL_LOCK in timeout_barrier cos it was an elegant^Wclever^W hack.
> because timeouts are run under the kernel lock, i knew i could take the
> lock and know that timeouts arent running anymore. that's all.
> 
> in hindsight this means that the thread calling timeout_barrier spins
> when it could sleep, and worse, it spins waiting for all pending
> timeouts to run. so yeah, i agree that undoing the hack is worth it on
> its own.

Okay so we're all in agreement that this could be improved, cool.

> > > The simplest replacement I can think of is a volatile pointer to the
> > > running timeout that we set before leaving the timeout_mutex and clear
> > > after reentering the same during timeout_run().
> > 
> > Sounds like a condition variable protected by this mutex.  Interesting
> > that cond_wait(9) doesn't work with a mutex. 
> 
> cond_wait borrows the sched lock to coordinate between the waiting
> thread and the signalling context. the cond api is basically a wrapper
> around sleep_setup/sleep_finish.
> 
> > > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just
> > > spins until the timeout function returns and the timeout_running
> > > pointer is changed.  Not every caller can sleep during
> > > timeout_del_barrier(9).  I think spinning is the simplest thing that
> > > will definitely work here.
> > 
> > This keeps the current semantic indeed.
> 
> i don't want to throw timeout_barrier out just yet.

I'm fine with that... where might it be useful?

> > > -void
> > > -timeout_barrier(struct timeout *to)
> > > +int
> > > +timeout_del_barrier(struct timeout *to)
> > >  {
> > > + struct timeout barrier;
> > > + struct cond c = COND_INITIALIZER();
> > >   int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
> > >  
> > >   timeout_sync_order(needsproc);
> > >  
> > > - if (!needsproc) {
> > > - KERNEL_LOCK();
> > > - splx(splsoftclock());
> > > - KERNEL_UNLOCK();
> > > - } else {
> > > - struct cond c = COND_INITIALIZER();
> > > - struct timeout barrier;
> > > + mtx_enter(_mutex);
> > > +
> > > + if (timeout_del_locked(to)) {
> > > + mtx_leave(_mutex);
> > > + return 1;
> > > + }
> > >  
> > > + if (needsproc) {
> > >   timeout_set_proc(, timeout_proc_barrier, );
> > >   barrier.to_process = curproc->p_p;
> > > -
> > > - mtx_enter(_mutex);
> > >   SET(barrier.to_flags, TIMEOUT_ONQUEUE);
> > >   CIRCQ_INSERT_TAIL(_proc, _list);
> > >   mtx_leave(_mutex);
> > > -
> > >   wakeup_one(_proc);
> > > -
> > >   cond_wait(, "tmobar");
> > > + } else {
> > > + mtx_leave(_mutex);
> > > + /* XXX Is this in the right spot? */
> > > + splx(splsoftclock());
> > > + while (timeout_running == to)
> > > + CPU_BUSY_CYCLE();
> > 
> > Won't splx() will execute the soft-interrupt if there's any pending?
> > Shouldn't the barrier be before?  Could you add `spc->spc_spinning++'
> > around the spinning loop?  What happen if two threads call
> > timeout_del_barrier(9) with the same argument at the same time?  Is
> > it possible and/or supported?
> 
> the timeout passed to timeout_barrier is only used to figure out which
> context the barrier should apply to, ie, it's used to pick between the
> softint or proc queue runners. like the other barriers in other parts of
> the kernel, it's just supposed to wait for the current work to finish
> running.

Here is where I get confused.

Why do I want to wait for 

Re: timeout_del_barrier(9): remove kernel lock

2021-05-04 Thread Scott Cheloha
On Tue, May 04, 2021 at 11:24:05AM +0200, Martin Pieuchot wrote:
> On 04/05/21(Tue) 01:10, Scott Cheloha wrote:
> > [...] 
> > I want to run softclock() without the kernel lock.  The way to go, I
> > think, is to first push the kernel lock down into timeout_run(), and
> > then to remove the kernel lock from each timeout, one by one.
> 
> Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates
> more latency than running all timeouts in a batch after having grabbed
> the KERNEL_LOCK().  I doubt this is the best way forward.

I have heard different views on this.

visa@ said once, and I paraphrase, "increasing the latency is a good
way to motivate people to remove the kernel lock from their timeouts."

If this "tough love" approach is unacceptable, we could also divide
timeouts up into two queues based on the presence of a flag, say
TIMEOUT_MPSAFE.  You would then run part of the softclock() under the
kernel lock and the other part without it.  We would probably pre-sort
the timeouts at scheduling time, i.e. timeout_add(9), to avoid an
additional O(n) traversal dring softclock() to sort them.

> > Before we can push the kernel lock down into timeout_run() we need to
> > remove the kernel lock from timeout_del_barrier(9).
> 
> Seems worth it on its own.

Yes.  dlg@ seems to agree, too.

> > The kernel lock is used in timeout_del_barrier(9) to determine whether
> > the given timeout has stopped running.  Because the softclock() runs
> > with the kernel lock we currently assume that once the calling thread
> > has taken the kernel lock any onging softclock() must have returned
> > and relinquished the lock, so the timeout in question has returned.
> 
> So you want to stop using the KERNEL_LOCK() to do the serialization?

Exactly.

> > The simplest replacement I can think of is a volatile pointer to the
> > running timeout that we set before leaving the timeout_mutex and clear
> > after reentering the same during timeout_run().
> 
> Sounds like a condition variable protected by this mutex.  Interesting
> that cond_wait(9) doesn't work with a mutex.

I can't speak on this.  dlg@ can.

> > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just
> > spins until the timeout function returns and the timeout_running
> > pointer is changed.  Not every caller can sleep during
> > timeout_del_barrier(9).  I think spinning is the simplest thing that
> > will definitely work here.
> 
> This keeps the current semantic indeed.

Apparently not.  See dlg@'s mail.

> > -void
> > -timeout_barrier(struct timeout *to)
> > +int
> > +timeout_del_barrier(struct timeout *to)
> >  {
> > +   struct timeout barrier;
> > +   struct cond c = COND_INITIALIZER();
> > int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
> >  
> > timeout_sync_order(needsproc);
> >  
> > -   if (!needsproc) {
> > -   KERNEL_LOCK();
> > -   splx(splsoftclock());
> > -   KERNEL_UNLOCK();
> > -   } else {
> > -   struct cond c = COND_INITIALIZER();
> > -   struct timeout barrier;
> > +   mtx_enter(_mutex);
> > +
> > +   if (timeout_del_locked(to)) {
> > +   mtx_leave(_mutex);
> > +   return 1;
> > +   }
> >  
> > +   if (needsproc) {
> > timeout_set_proc(, timeout_proc_barrier, );
> > barrier.to_process = curproc->p_p;
> > -
> > -   mtx_enter(_mutex);
> > SET(barrier.to_flags, TIMEOUT_ONQUEUE);
> > CIRCQ_INSERT_TAIL(_proc, _list);
> > mtx_leave(_mutex);
> > -
> > wakeup_one(_proc);
> > -
> > cond_wait(, "tmobar");
> > +   } else {
> > +   mtx_leave(_mutex);
> > +   /* XXX Is this in the right spot? */
> > +   splx(splsoftclock());
> > +   while (timeout_running == to)
> > +   CPU_BUSY_CYCLE();
> 
> Won't splx() will execute the soft-interrupt if there's any pending?

Yes, I believe that that's the point.  If the calling thread is
running on the primary CPU we want to give softclock() a chance to run
before we spin.

> Shouldn't the barrier be before?  Could you add `spc->spc_spinning++'
> around the spinning loop?

That's easy enough, sure.

> What happen if two threads call timeout_del_barrier(9) with the same
> argument at the same time?  Is it possible and/or supported?

That would work as expected, yes.

(okay, now onto dlg@'s mail.)



Re: timeout_del_barrier(9): remove kernel lock

2021-05-04 Thread David Gwynne
On Tue, May 04, 2021 at 11:24:05AM +0200, Martin Pieuchot wrote:
> On 04/05/21(Tue) 01:10, Scott Cheloha wrote:
> > [...] 
> > I want to run softclock() without the kernel lock.  The way to go, I
> > think, is to first push the kernel lock down into timeout_run(), and
> > then to remove the kernel lock from each timeout, one by one.
> 
> Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates
> more latency than running all timeouts in a batch after having grabbed
> the KERNEL_LOCK().  I doubt this is the best way forward.
> 
> > Before we can push the kernel lock down into timeout_run() we need to
> > remove the kernel lock from timeout_del_barrier(9).
> 
> Seems worth it on its own.
>
> > The kernel lock is used in timeout_del_barrier(9) to determine whether
> > the given timeout has stopped running.  Because the softclock() runs
> > with the kernel lock we currently assume that once the calling thread
> > has taken the kernel lock any onging softclock() must have returned
> > and relinquished the lock, so the timeout in question has returned.

as i'll try to explain below, it's not about waiting for a specific
timeout, it's about knowing that a currently running timeout has
finished.

> So you want to stop using the KERNEL_LOCK() to do the serialization?  

the KERNEL_LOCK in timeout_barrier cos it was an elegant^Wclever^W hack.
because timeouts are run under the kernel lock, i knew i could take the
lock and know that timeouts arent running anymore. that's all.

in hindsight this means that the thread calling timeout_barrier spins
when it could sleep, and worse, it spins waiting for all pending
timeouts to run. so yeah, i agree that undoing the hack is worth it on
its own.

> > The simplest replacement I can think of is a volatile pointer to the
> > running timeout that we set before leaving the timeout_mutex and clear
> > after reentering the same during timeout_run().
> 
> Sounds like a condition variable protected by this mutex.  Interesting
> that cond_wait(9) doesn't work with a mutex. 

cond_wait borrows the sched lock to coordinate between the waiting
thread and the signalling context. the cond api is basically a wrapper
around sleep_setup/sleep_finish.

> > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just
> > spins until the timeout function returns and the timeout_running
> > pointer is changed.  Not every caller can sleep during
> > timeout_del_barrier(9).  I think spinning is the simplest thing that
> > will definitely work here.
> 
> This keeps the current semantic indeed.

i don't want to throw timeout_barrier out just yet.

> > -void
> > -timeout_barrier(struct timeout *to)
> > +int
> > +timeout_del_barrier(struct timeout *to)
> >  {
> > +   struct timeout barrier;
> > +   struct cond c = COND_INITIALIZER();
> > int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
> >  
> > timeout_sync_order(needsproc);
> >  
> > -   if (!needsproc) {
> > -   KERNEL_LOCK();
> > -   splx(splsoftclock());
> > -   KERNEL_UNLOCK();
> > -   } else {
> > -   struct cond c = COND_INITIALIZER();
> > -   struct timeout barrier;
> > +   mtx_enter(_mutex);
> > +
> > +   if (timeout_del_locked(to)) {
> > +   mtx_leave(_mutex);
> > +   return 1;
> > +   }
> >  
> > +   if (needsproc) {
> > timeout_set_proc(, timeout_proc_barrier, );
> > barrier.to_process = curproc->p_p;
> > -
> > -   mtx_enter(_mutex);
> > SET(barrier.to_flags, TIMEOUT_ONQUEUE);
> > CIRCQ_INSERT_TAIL(_proc, _list);
> > mtx_leave(_mutex);
> > -
> > wakeup_one(_proc);
> > -
> > cond_wait(, "tmobar");
> > +   } else {
> > +   mtx_leave(_mutex);
> > +   /* XXX Is this in the right spot? */
> > +   splx(splsoftclock());
> > +   while (timeout_running == to)
> > +   CPU_BUSY_CYCLE();
> 
> Won't splx() will execute the soft-interrupt if there's any pending?
> Shouldn't the barrier be before?  Could you add `spc->spc_spinning++'
> around the spinning loop?  What happen if two threads call
> timeout_del_barrier(9) with the same argument at the same time?  Is
> it possible and/or supported?

the timeout passed to timeout_barrier is only used to figure out which
context the barrier should apply to, ie, it's used to pick between the
softint or proc queue runners. like the other barriers in other parts of
the kernel, it's just supposed to wait for the current work to finish
running.

it is unfortunate that the timeout_barrier man page isn't very
clear. it's worth reading the manpage for intr_barrier and
taskq_barrier. sched_barrier is a thing that exists too, but isn't
documented. also have a look at the EXAMPLE in the cond_wait(9) manpage
too. however, don't read the taskq_barrier code. timeout_barrier
is like intr_barrier in that it uses the argument to find out where
work runs, but again, it doesn't care about the specific 

Re: timeout_del_barrier(9): remove kernel lock

2021-05-04 Thread Martin Pieuchot
On 04/05/21(Tue) 01:10, Scott Cheloha wrote:
> [...] 
> I want to run softclock() without the kernel lock.  The way to go, I
> think, is to first push the kernel lock down into timeout_run(), and
> then to remove the kernel lock from each timeout, one by one.

Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates
more latency than running all timeouts in a batch after having grabbed
the KERNEL_LOCK().  I doubt this is the best way forward.

> Before we can push the kernel lock down into timeout_run() we need to
> remove the kernel lock from timeout_del_barrier(9).

Seems worth it on its own.

> The kernel lock is used in timeout_del_barrier(9) to determine whether
> the given timeout has stopped running.  Because the softclock() runs
> with the kernel lock we currently assume that once the calling thread
> has taken the kernel lock any onging softclock() must have returned
> and relinquished the lock, so the timeout in question has returned.

So you want to stop using the KERNEL_LOCK() to do the serialization?  
 
> The simplest replacement I can think of is a volatile pointer to the
> running timeout that we set before leaving the timeout_mutex and clear
> after reentering the same during timeout_run().

Sounds like a condition variable protected by this mutex.  Interesting
that cond_wait(9) doesn't work with a mutex. 

> So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just
> spins until the timeout function returns and the timeout_running
> pointer is changed.  Not every caller can sleep during
> timeout_del_barrier(9).  I think spinning is the simplest thing that
> will definitely work here.

This keeps the current semantic indeed.

> -void
> -timeout_barrier(struct timeout *to)
> +int
> +timeout_del_barrier(struct timeout *to)
>  {
> + struct timeout barrier;
> + struct cond c = COND_INITIALIZER();
>   int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
>  
>   timeout_sync_order(needsproc);
>  
> - if (!needsproc) {
> - KERNEL_LOCK();
> - splx(splsoftclock());
> - KERNEL_UNLOCK();
> - } else {
> - struct cond c = COND_INITIALIZER();
> - struct timeout barrier;
> + mtx_enter(_mutex);
> +
> + if (timeout_del_locked(to)) {
> + mtx_leave(_mutex);
> + return 1;
> + }
>  
> + if (needsproc) {
>   timeout_set_proc(, timeout_proc_barrier, );
>   barrier.to_process = curproc->p_p;
> -
> - mtx_enter(_mutex);
>   SET(barrier.to_flags, TIMEOUT_ONQUEUE);
>   CIRCQ_INSERT_TAIL(_proc, _list);
>   mtx_leave(_mutex);
> -
>   wakeup_one(_proc);
> -
>   cond_wait(, "tmobar");
> + } else {
> + mtx_leave(_mutex);
> + /* XXX Is this in the right spot? */
> + splx(splsoftclock());
> + while (timeout_running == to)
> + CPU_BUSY_CYCLE();

Won't splx() will execute the soft-interrupt if there's any pending?
Shouldn't the barrier be before?  Could you add `spc->spc_spinning++'
around the spinning loop?  What happen if two threads call
timeout_del_barrier(9) with the same argument at the same time?  Is
it possible and/or supported?



timeout_del_barrier(9): remove kernel lock

2021-05-04 Thread Scott Cheloha
Hi,

I want to run softclock() without the kernel lock.  The way to go, I
think, is to first push the kernel lock down into timeout_run(), and
then to remove the kernel lock from each timeout, one by one.

Before we can push the kernel lock down into timeout_run() we need to
remove the kernel lock from timeout_del_barrier(9).

The kernel lock is used in timeout_del_barrier(9) to determine whether
the given timeout has stopped running.  Because the softclock() runs
with the kernel lock we currently assume that once the calling thread
has taken the kernel lock any onging softclock() must have returned
and relinquished the lock, so the timeout in question has returned.

The simplest replacement I can think of is a volatile pointer to the
running timeout that we set before leaving the timeout_mutex and clear
after reentering the same during timeout_run().

So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just
spins until the timeout function returns and the timeout_running
pointer is changed.  Not every caller can sleep during
timeout_del_barrier(9).  I think spinning is the simplest thing that
will definitely work here.

There is no behavior change in the TIMEOUT_PROC case.  The kernel lock
was never involved in that path.

Misc:

- The timeout_barrier(9) routine has one caller.  Can we just fold it
  into timeout_del_barrier(9) or do we need to keep it separate for API
  completeness?  See patch.

- Assuming it's okay to fold timeout_barrier() into timeout_del_barrier():
  I have merged two separate timeout_mutex sections into one.  Unclear if
  I still need a second timeout_sync_order() call before returning from
  timeout_del_barrier().

- Where, if anywhere, does the splx(splsoftclock()) call belong in
  the non-TIMEOUT_PROC case?  In the current code it is between
  KERNEL_LOCK() and KERNEL_UNLOCK().  There is no equivalent spot
  in the new code.

(Manpage changes omitted for the moment to keep the diff short.)

Thoughts?

Index: kern/kern_timeout.c
===
RCS file: /cvs/src/sys/kern/kern_timeout.c,v
retrieving revision 1.83
diff -u -p -r1.83 kern_timeout.c
--- kern/kern_timeout.c 8 Feb 2021 08:18:45 -   1.83
+++ kern/kern_timeout.c 4 May 2021 06:05:42 -
@@ -79,6 +79,8 @@ struct circq timeout_proc;/* [T] Due +
 time_t timeout_level_width[WHEELCOUNT];/* [I] Wheel level width 
(seconds) */
 struct timespec tick_ts;   /* [I] Length of a tick (1/hz secs) */
 
+volatile struct timeout *timeout_running; /* [T] Running timeout, if any */
+
 struct kclock {
struct timespec kc_lastscan;/* [T] Clock time at last wheel scan */
struct timespec kc_late;/* [T] Late if due prior */
@@ -173,6 +175,7 @@ void softclock_process_kclock_timeout(st
 void softclock_process_tick_timeout(struct timeout *, int);
 void softclock_thread(void *);
 uint32_t timeout_bucket(const struct timeout *);
+int timeout_del_locked(struct timeout *);
 uint32_t timeout_maskwheel(uint32_t, const struct timespec *);
 void timeout_run(struct timeout *);
 void timeout_proc_barrier(void *);
@@ -455,11 +458,12 @@ kclock_nanotime(int kclock, struct times
 }
 
 int
-timeout_del(struct timeout *to)
+timeout_del_locked(struct timeout *to)
 {
int ret = 0;
 
-   mtx_enter(_mutex);
+   MUTEX_ASSERT_LOCKED(_mutex);
+
if (ISSET(to->to_flags, TIMEOUT_ONQUEUE)) {
CIRCQ_REMOVE(>to_list);
CLR(to->to_flags, TIMEOUT_ONQUEUE);
@@ -468,52 +472,54 @@ timeout_del(struct timeout *to)
}
CLR(to->to_flags, TIMEOUT_TRIGGERED);
tostat.tos_deleted++;
-   mtx_leave(_mutex);
 
return ret;
 }
 
 int
-timeout_del_barrier(struct timeout *to)
+timeout_del(struct timeout *to)
 {
-   int removed;
+   int status;
 
-   timeout_sync_order(ISSET(to->to_flags, TIMEOUT_PROC));
-
-   removed = timeout_del(to);
-   if (!removed)
-   timeout_barrier(to);
-
-   return removed;
+   mtx_enter(_mutex);
+   status = timeout_del_locked(to);
+   mtx_leave(_mutex);
+   return status;
 }
 
-void
-timeout_barrier(struct timeout *to)
+int
+timeout_del_barrier(struct timeout *to)
 {
+   struct timeout barrier;
+   struct cond c = COND_INITIALIZER();
int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
 
timeout_sync_order(needsproc);
 
-   if (!needsproc) {
-   KERNEL_LOCK();
-   splx(splsoftclock());
-   KERNEL_UNLOCK();
-   } else {
-   struct cond c = COND_INITIALIZER();
-   struct timeout barrier;
+   mtx_enter(_mutex);
+
+   if (timeout_del_locked(to)) {
+   mtx_leave(_mutex);
+   return 1;
+   }
 
+   if (needsproc) {
timeout_set_proc(, timeout_proc_barrier, );
barrier.to_process = curproc->p_p;
-
-   mtx_enter(_mutex);