Re: timeout_del_barrier(9): remove kernel lock
On Thu, May 06, 2021 at 11:43:55AM -0500, Scott Cheloha wrote: > On Wed, May 05, 2021 at 11:05:06AM +1000, David Gwynne wrote: > > On Tue, May 04, 2021 at 11:54:31AM -0500, Scott Cheloha wrote: > > > > > > [...] > > > > > > Here is where I get confused. > > > > > > Why do I want to wait for *all* timeouts in the queue to finish > > > running? > > > > You don't, you just want to know that whatever timeout was running > > has finished cos you don't know if that currently running timeout is > > yours or not. > > > > > My understanding of the timeout_del_barrier(9) use case was as > > > follows: > > > > > > 1. I have a dynamically allocated timeout struct. The timeout is > > >scheduled to execute at some point in the future. > > > > > > 2. I want to free the memory allocated to for the timeout. > > > > > > 3. To safely free the memory I need to ensure the timeout > > >is not executing. Until the timeout function, i.e. > > >to->to_func(), has returned it isn't necessarily safe to > > >free that memory. > > > > > > 4. If I call timeout_del_barrier(9), it will only return if the > > >timeout in question is not running. Assuming the timeout itself > > >is not a periodic timeout that reschedules itself we can then > > >safely free the memory. > > > > Barriers aren't about references to timeouts, they're about references > > to the work associated with a timeout. > > > > If you only cared about the timeout struct itself, then you can get > > all the information about whether it's referenced by the timeout > > queues from the return values from timeout_add and timeout_del. > > timeout_add() returns 1 if the timeout subsystem takes the reference > > to the timeout. If the timeout is already scheduled it returns 0 > > because it's already got a reference. timeout_del returns 1 if the > > timeout itself was scheduled and it removed it, therefore giving > > up it's reference. If you timeout_del and it returns 0, then it > > wasn't on a queue inside timeouts. Easy. > > > > What timeout_add and timeout_del don't tell you is whether the work > > referenced by the timeout is currently running. The timeout runners > > copy the function and argument onto the stack when they dequeue a > > timeout to run, and give up the reference to the timeout when the > > mutex around the timeout wheels/cirqs is released. However, the argument > > is still referenced on the stack and the function to process it may be > > about to run or is running. You can't tell that from the timeout_add/del > > return values though. > > > > We provide two ways to deal with that. One is you have reference > > counters (or similar) on the thing you're deferring to a timeout. > > The other is you use a barrier so you know the work you deferred > > isn't on the timeout runners stack anymore because it's moved on > > to run the barrier work. > > > > This is consistent with tasks and interrupts. > > > > > Given this understanding, my approach was to focus on the timeout in > > > question. So my code just spins until it the timeout is no longer > > > running. > > > > > > But now I think I am mistaken. > > > > > > IIUC you're telling me (and showing me, in code) that the goal of > > > timeout_barrier() is to wait for the *whole* softclock() to return, > > > not just the one timeout. > > > > No, just the currently running timeout. > > > > Putting the barrier timeout on the end of the proc and todo cirqs > > is because there's no CIRCQ_INSERT_HEAD I can use right now. > > > > > Why do I want to wait for the whole softclock() or softclock_thread()? > > > Why not just wait for the one timeout to finish? > > > > Cos I was too lazy to write CIRCQ_INSERT_HEAD last night :D > > Okay. After thinking this over I'm pretty sure we are skinning the > same cat in two different ways. > > Your cond_wait(9) approach is fine for both timeout types because > timeout_del_barrier(9) is only safe to call from process context. > When I wrote my first diff I was under the impression it was safe to > call timeout_del_barrier(9) from interrupt context. But I reread the > manpage and now I see that this is not the case, my bad. > > > [...] > > > > This discussion has some relevance to taskqs too. I was also lazy when I > > implemented taskq_barrier and used task_add to put the barrier onto the > > list of work, which meant it got added to the end. Now DRM relies on > > this behaviour. Maybe I should have pushed to commit the diff below. > > > > This diff lets taskq barriers run as soon as possible, but adds compat > > to the drm workqueue stuff so they can wait for all pending work to > > finish as they expect. > > > > P.S. don't call timeout_del from inside a timeout handler. I still think > > timeout_set_proc was a mistake. > > > > [...] > > Let's pick this whole train of thought (taskq, etc.) up in a > different thread. One thing at a time. > > -- > > Updated patch: > > - Keep timeout_barrier(9). > > - In
Re: timeout_del_barrier(9): remove kernel lock
On Wed, May 05, 2021 at 11:05:06AM +1000, David Gwynne wrote: > On Tue, May 04, 2021 at 11:54:31AM -0500, Scott Cheloha wrote: > > > > [...] > > > > Here is where I get confused. > > > > Why do I want to wait for *all* timeouts in the queue to finish > > running? > > You don't, you just want to know that whatever timeout was running > has finished cos you don't know if that currently running timeout is > yours or not. > > > My understanding of the timeout_del_barrier(9) use case was as > > follows: > > > > 1. I have a dynamically allocated timeout struct. The timeout is > >scheduled to execute at some point in the future. > > > > 2. I want to free the memory allocated to for the timeout. > > > > 3. To safely free the memory I need to ensure the timeout > >is not executing. Until the timeout function, i.e. > >to->to_func(), has returned it isn't necessarily safe to > >free that memory. > > > > 4. If I call timeout_del_barrier(9), it will only return if the > >timeout in question is not running. Assuming the timeout itself > >is not a periodic timeout that reschedules itself we can then > >safely free the memory. > > Barriers aren't about references to timeouts, they're about references > to the work associated with a timeout. > > If you only cared about the timeout struct itself, then you can get > all the information about whether it's referenced by the timeout > queues from the return values from timeout_add and timeout_del. > timeout_add() returns 1 if the timeout subsystem takes the reference > to the timeout. If the timeout is already scheduled it returns 0 > because it's already got a reference. timeout_del returns 1 if the > timeout itself was scheduled and it removed it, therefore giving > up it's reference. If you timeout_del and it returns 0, then it > wasn't on a queue inside timeouts. Easy. > > What timeout_add and timeout_del don't tell you is whether the work > referenced by the timeout is currently running. The timeout runners > copy the function and argument onto the stack when they dequeue a > timeout to run, and give up the reference to the timeout when the > mutex around the timeout wheels/cirqs is released. However, the argument > is still referenced on the stack and the function to process it may be > about to run or is running. You can't tell that from the timeout_add/del > return values though. > > We provide two ways to deal with that. One is you have reference > counters (or similar) on the thing you're deferring to a timeout. > The other is you use a barrier so you know the work you deferred > isn't on the timeout runners stack anymore because it's moved on > to run the barrier work. > > This is consistent with tasks and interrupts. > > > Given this understanding, my approach was to focus on the timeout in > > question. So my code just spins until it the timeout is no longer > > running. > > > > But now I think I am mistaken. > > > > IIUC you're telling me (and showing me, in code) that the goal of > > timeout_barrier() is to wait for the *whole* softclock() to return, > > not just the one timeout. > > No, just the currently running timeout. > > Putting the barrier timeout on the end of the proc and todo cirqs > is because there's no CIRCQ_INSERT_HEAD I can use right now. > > > Why do I want to wait for the whole softclock() or softclock_thread()? > > Why not just wait for the one timeout to finish? > > Cos I was too lazy to write CIRCQ_INSERT_HEAD last night :D Okay. After thinking this over I'm pretty sure we are skinning the same cat in two different ways. Your cond_wait(9) approach is fine for both timeout types because timeout_del_barrier(9) is only safe to call from process context. When I wrote my first diff I was under the impression it was safe to call timeout_del_barrier(9) from interrupt context. But I reread the manpage and now I see that this is not the case, my bad. > [...] > > This discussion has some relevance to taskqs too. I was also lazy when I > implemented taskq_barrier and used task_add to put the barrier onto the > list of work, which meant it got added to the end. Now DRM relies on > this behaviour. Maybe I should have pushed to commit the diff below. > > This diff lets taskq barriers run as soon as possible, but adds compat > to the drm workqueue stuff so they can wait for all pending work to > finish as they expect. > > P.S. don't call timeout_del from inside a timeout handler. I still think > timeout_set_proc was a mistake. > > [...] Let's pick this whole train of thought (taskq, etc.) up in a different thread. One thing at a time. -- Updated patch: - Keep timeout_barrier(9). - In timeout_barrier(9) use the barrier timeout for both normal and process-context timeouts. All callers use cond_wait(9) now. - Remove the kernel lock from the non-process path. I assume I don't need to splx(splschedclock()) anymore? - Rename "timeout_proc_barrier()" to "timeout_barrier_timeout()"
Re: timeout_del_barrier(9): remove kernel lock
On Tue, May 04, 2021 at 11:54:31AM -0500, Scott Cheloha wrote: > On Tue, May 04, 2021 at 11:21:27PM +1000, David Gwynne wrote: > > On Tue, May 04, 2021 at 11:24:05AM +0200, Martin Pieuchot wrote: > > > On 04/05/21(Tue) 01:10, Scott Cheloha wrote: > > > > [...] > > > > I want to run softclock() without the kernel lock. The way to go, I > > > > think, is to first push the kernel lock down into timeout_run(), and > > > > then to remove the kernel lock from each timeout, one by one. > > > > > > Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates > > > more latency than running all timeouts in a batch after having grabbed > > > the KERNEL_LOCK(). I doubt this is the best way forward. > > > > > > > Before we can push the kernel lock down into timeout_run() we need to > > > > remove the kernel lock from timeout_del_barrier(9). > > > > > > Seems worth it on its own. > > > > > > > The kernel lock is used in timeout_del_barrier(9) to determine whether > > > > the given timeout has stopped running. Because the softclock() runs > > > > with the kernel lock we currently assume that once the calling thread > > > > has taken the kernel lock any onging softclock() must have returned > > > > and relinquished the lock, so the timeout in question has returned. > > > > as i'll try to explain below, it's not about waiting for a specific > > timeout, it's about knowing that a currently running timeout has > > finished. > > I'm very confused about the distinction between the two. It is subtle :( > > > So you want to stop using the KERNEL_LOCK() to do the serialization? > > > > the KERNEL_LOCK in timeout_barrier cos it was an elegant^Wclever^W hack. > > because timeouts are run under the kernel lock, i knew i could take the > > lock and know that timeouts arent running anymore. that's all. > > > > in hindsight this means that the thread calling timeout_barrier spins > > when it could sleep, and worse, it spins waiting for all pending > > timeouts to run. so yeah, i agree that undoing the hack is worth it on > > its own. > > Okay so we're all in agreement that this could be improved, cool. Yep. > > > > The simplest replacement I can think of is a volatile pointer to the > > > > running timeout that we set before leaving the timeout_mutex and clear > > > > after reentering the same during timeout_run(). > > > > > > Sounds like a condition variable protected by this mutex. Interesting > > > that cond_wait(9) doesn't work with a mutex. > > > > cond_wait borrows the sched lock to coordinate between the waiting > > thread and the signalling context. the cond api is basically a wrapper > > around sleep_setup/sleep_finish. > > > > > > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just > > > > spins until the timeout function returns and the timeout_running > > > > pointer is changed. Not every caller can sleep during > > > > timeout_del_barrier(9). I think spinning is the simplest thing that > > > > will definitely work here. > > > > > > This keeps the current semantic indeed. > > > > i don't want to throw timeout_barrier out just yet. > > I'm fine with that... where might it be useful? A single timeout_barrier call can be used as a barrier for multiple timeouts. This would feel more natural if timeout_barrier took a flag instead of a timeout pointer, or if we didnt have the timeout_proc context. > > > > -void > > > > -timeout_barrier(struct timeout *to) > > > > +int > > > > +timeout_del_barrier(struct timeout *to) > > > > { > > > > + struct timeout barrier; > > > > + struct cond c = COND_INITIALIZER(); > > > > int needsproc = ISSET(to->to_flags, TIMEOUT_PROC); > > > > > > > > timeout_sync_order(needsproc); > > > > > > > > - if (!needsproc) { > > > > - KERNEL_LOCK(); > > > > - splx(splsoftclock()); > > > > - KERNEL_UNLOCK(); > > > > - } else { > > > > - struct cond c = COND_INITIALIZER(); > > > > - struct timeout barrier; > > > > + mtx_enter(_mutex); > > > > + > > > > + if (timeout_del_locked(to)) { > > > > + mtx_leave(_mutex); > > > > + return 1; > > > > + } > > > > > > > > + if (needsproc) { > > > > timeout_set_proc(, timeout_proc_barrier, ); > > > > barrier.to_process = curproc->p_p; > > > > - > > > > - mtx_enter(_mutex); > > > > SET(barrier.to_flags, TIMEOUT_ONQUEUE); > > > > CIRCQ_INSERT_TAIL(_proc, _list); > > > > mtx_leave(_mutex); > > > > - > > > > wakeup_one(_proc); > > > > - > > > > cond_wait(, "tmobar"); > > > > + } else { > > > > + mtx_leave(_mutex); > > > > + /* XXX Is this in the right spot? */ > > > > + splx(splsoftclock()); > > > > + while (timeout_running == to) > > > > +
Re: timeout_del_barrier(9): remove kernel lock
On Tue, May 04, 2021 at 11:21:27PM +1000, David Gwynne wrote: > On Tue, May 04, 2021 at 11:24:05AM +0200, Martin Pieuchot wrote: > > On 04/05/21(Tue) 01:10, Scott Cheloha wrote: > > > [...] > > > I want to run softclock() without the kernel lock. The way to go, I > > > think, is to first push the kernel lock down into timeout_run(), and > > > then to remove the kernel lock from each timeout, one by one. > > > > Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates > > more latency than running all timeouts in a batch after having grabbed > > the KERNEL_LOCK(). I doubt this is the best way forward. > > > > > Before we can push the kernel lock down into timeout_run() we need to > > > remove the kernel lock from timeout_del_barrier(9). > > > > Seems worth it on its own. > > > > > The kernel lock is used in timeout_del_barrier(9) to determine whether > > > the given timeout has stopped running. Because the softclock() runs > > > with the kernel lock we currently assume that once the calling thread > > > has taken the kernel lock any onging softclock() must have returned > > > and relinquished the lock, so the timeout in question has returned. > > as i'll try to explain below, it's not about waiting for a specific > timeout, it's about knowing that a currently running timeout has > finished. I'm very confused about the distinction between the two. > > So you want to stop using the KERNEL_LOCK() to do the serialization? > > the KERNEL_LOCK in timeout_barrier cos it was an elegant^Wclever^W hack. > because timeouts are run under the kernel lock, i knew i could take the > lock and know that timeouts arent running anymore. that's all. > > in hindsight this means that the thread calling timeout_barrier spins > when it could sleep, and worse, it spins waiting for all pending > timeouts to run. so yeah, i agree that undoing the hack is worth it on > its own. Okay so we're all in agreement that this could be improved, cool. > > > The simplest replacement I can think of is a volatile pointer to the > > > running timeout that we set before leaving the timeout_mutex and clear > > > after reentering the same during timeout_run(). > > > > Sounds like a condition variable protected by this mutex. Interesting > > that cond_wait(9) doesn't work with a mutex. > > cond_wait borrows the sched lock to coordinate between the waiting > thread and the signalling context. the cond api is basically a wrapper > around sleep_setup/sleep_finish. > > > > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just > > > spins until the timeout function returns and the timeout_running > > > pointer is changed. Not every caller can sleep during > > > timeout_del_barrier(9). I think spinning is the simplest thing that > > > will definitely work here. > > > > This keeps the current semantic indeed. > > i don't want to throw timeout_barrier out just yet. I'm fine with that... where might it be useful? > > > -void > > > -timeout_barrier(struct timeout *to) > > > +int > > > +timeout_del_barrier(struct timeout *to) > > > { > > > + struct timeout barrier; > > > + struct cond c = COND_INITIALIZER(); > > > int needsproc = ISSET(to->to_flags, TIMEOUT_PROC); > > > > > > timeout_sync_order(needsproc); > > > > > > - if (!needsproc) { > > > - KERNEL_LOCK(); > > > - splx(splsoftclock()); > > > - KERNEL_UNLOCK(); > > > - } else { > > > - struct cond c = COND_INITIALIZER(); > > > - struct timeout barrier; > > > + mtx_enter(_mutex); > > > + > > > + if (timeout_del_locked(to)) { > > > + mtx_leave(_mutex); > > > + return 1; > > > + } > > > > > > + if (needsproc) { > > > timeout_set_proc(, timeout_proc_barrier, ); > > > barrier.to_process = curproc->p_p; > > > - > > > - mtx_enter(_mutex); > > > SET(barrier.to_flags, TIMEOUT_ONQUEUE); > > > CIRCQ_INSERT_TAIL(_proc, _list); > > > mtx_leave(_mutex); > > > - > > > wakeup_one(_proc); > > > - > > > cond_wait(, "tmobar"); > > > + } else { > > > + mtx_leave(_mutex); > > > + /* XXX Is this in the right spot? */ > > > + splx(splsoftclock()); > > > + while (timeout_running == to) > > > + CPU_BUSY_CYCLE(); > > > > Won't splx() will execute the soft-interrupt if there's any pending? > > Shouldn't the barrier be before? Could you add `spc->spc_spinning++' > > around the spinning loop? What happen if two threads call > > timeout_del_barrier(9) with the same argument at the same time? Is > > it possible and/or supported? > > the timeout passed to timeout_barrier is only used to figure out which > context the barrier should apply to, ie, it's used to pick between the > softint or proc queue runners. like the other barriers in other parts of > the kernel, it's just supposed to wait for the current work to finish > running. Here is where I get confused. Why do I want to wait for
Re: timeout_del_barrier(9): remove kernel lock
On Tue, May 04, 2021 at 11:24:05AM +0200, Martin Pieuchot wrote: > On 04/05/21(Tue) 01:10, Scott Cheloha wrote: > > [...] > > I want to run softclock() without the kernel lock. The way to go, I > > think, is to first push the kernel lock down into timeout_run(), and > > then to remove the kernel lock from each timeout, one by one. > > Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates > more latency than running all timeouts in a batch after having grabbed > the KERNEL_LOCK(). I doubt this is the best way forward. I have heard different views on this. visa@ said once, and I paraphrase, "increasing the latency is a good way to motivate people to remove the kernel lock from their timeouts." If this "tough love" approach is unacceptable, we could also divide timeouts up into two queues based on the presence of a flag, say TIMEOUT_MPSAFE. You would then run part of the softclock() under the kernel lock and the other part without it. We would probably pre-sort the timeouts at scheduling time, i.e. timeout_add(9), to avoid an additional O(n) traversal dring softclock() to sort them. > > Before we can push the kernel lock down into timeout_run() we need to > > remove the kernel lock from timeout_del_barrier(9). > > Seems worth it on its own. Yes. dlg@ seems to agree, too. > > The kernel lock is used in timeout_del_barrier(9) to determine whether > > the given timeout has stopped running. Because the softclock() runs > > with the kernel lock we currently assume that once the calling thread > > has taken the kernel lock any onging softclock() must have returned > > and relinquished the lock, so the timeout in question has returned. > > So you want to stop using the KERNEL_LOCK() to do the serialization? Exactly. > > The simplest replacement I can think of is a volatile pointer to the > > running timeout that we set before leaving the timeout_mutex and clear > > after reentering the same during timeout_run(). > > Sounds like a condition variable protected by this mutex. Interesting > that cond_wait(9) doesn't work with a mutex. I can't speak on this. dlg@ can. > > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just > > spins until the timeout function returns and the timeout_running > > pointer is changed. Not every caller can sleep during > > timeout_del_barrier(9). I think spinning is the simplest thing that > > will definitely work here. > > This keeps the current semantic indeed. Apparently not. See dlg@'s mail. > > -void > > -timeout_barrier(struct timeout *to) > > +int > > +timeout_del_barrier(struct timeout *to) > > { > > + struct timeout barrier; > > + struct cond c = COND_INITIALIZER(); > > int needsproc = ISSET(to->to_flags, TIMEOUT_PROC); > > > > timeout_sync_order(needsproc); > > > > - if (!needsproc) { > > - KERNEL_LOCK(); > > - splx(splsoftclock()); > > - KERNEL_UNLOCK(); > > - } else { > > - struct cond c = COND_INITIALIZER(); > > - struct timeout barrier; > > + mtx_enter(_mutex); > > + > > + if (timeout_del_locked(to)) { > > + mtx_leave(_mutex); > > + return 1; > > + } > > > > + if (needsproc) { > > timeout_set_proc(, timeout_proc_barrier, ); > > barrier.to_process = curproc->p_p; > > - > > - mtx_enter(_mutex); > > SET(barrier.to_flags, TIMEOUT_ONQUEUE); > > CIRCQ_INSERT_TAIL(_proc, _list); > > mtx_leave(_mutex); > > - > > wakeup_one(_proc); > > - > > cond_wait(, "tmobar"); > > + } else { > > + mtx_leave(_mutex); > > + /* XXX Is this in the right spot? */ > > + splx(splsoftclock()); > > + while (timeout_running == to) > > + CPU_BUSY_CYCLE(); > > Won't splx() will execute the soft-interrupt if there's any pending? Yes, I believe that that's the point. If the calling thread is running on the primary CPU we want to give softclock() a chance to run before we spin. > Shouldn't the barrier be before? Could you add `spc->spc_spinning++' > around the spinning loop? That's easy enough, sure. > What happen if two threads call timeout_del_barrier(9) with the same > argument at the same time? Is it possible and/or supported? That would work as expected, yes. (okay, now onto dlg@'s mail.)
Re: timeout_del_barrier(9): remove kernel lock
On Tue, May 04, 2021 at 11:24:05AM +0200, Martin Pieuchot wrote: > On 04/05/21(Tue) 01:10, Scott Cheloha wrote: > > [...] > > I want to run softclock() without the kernel lock. The way to go, I > > think, is to first push the kernel lock down into timeout_run(), and > > then to remove the kernel lock from each timeout, one by one. > > Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates > more latency than running all timeouts in a batch after having grabbed > the KERNEL_LOCK(). I doubt this is the best way forward. > > > Before we can push the kernel lock down into timeout_run() we need to > > remove the kernel lock from timeout_del_barrier(9). > > Seems worth it on its own. > > > The kernel lock is used in timeout_del_barrier(9) to determine whether > > the given timeout has stopped running. Because the softclock() runs > > with the kernel lock we currently assume that once the calling thread > > has taken the kernel lock any onging softclock() must have returned > > and relinquished the lock, so the timeout in question has returned. as i'll try to explain below, it's not about waiting for a specific timeout, it's about knowing that a currently running timeout has finished. > So you want to stop using the KERNEL_LOCK() to do the serialization? the KERNEL_LOCK in timeout_barrier cos it was an elegant^Wclever^W hack. because timeouts are run under the kernel lock, i knew i could take the lock and know that timeouts arent running anymore. that's all. in hindsight this means that the thread calling timeout_barrier spins when it could sleep, and worse, it spins waiting for all pending timeouts to run. so yeah, i agree that undoing the hack is worth it on its own. > > The simplest replacement I can think of is a volatile pointer to the > > running timeout that we set before leaving the timeout_mutex and clear > > after reentering the same during timeout_run(). > > Sounds like a condition variable protected by this mutex. Interesting > that cond_wait(9) doesn't work with a mutex. cond_wait borrows the sched lock to coordinate between the waiting thread and the signalling context. the cond api is basically a wrapper around sleep_setup/sleep_finish. > > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just > > spins until the timeout function returns and the timeout_running > > pointer is changed. Not every caller can sleep during > > timeout_del_barrier(9). I think spinning is the simplest thing that > > will definitely work here. > > This keeps the current semantic indeed. i don't want to throw timeout_barrier out just yet. > > -void > > -timeout_barrier(struct timeout *to) > > +int > > +timeout_del_barrier(struct timeout *to) > > { > > + struct timeout barrier; > > + struct cond c = COND_INITIALIZER(); > > int needsproc = ISSET(to->to_flags, TIMEOUT_PROC); > > > > timeout_sync_order(needsproc); > > > > - if (!needsproc) { > > - KERNEL_LOCK(); > > - splx(splsoftclock()); > > - KERNEL_UNLOCK(); > > - } else { > > - struct cond c = COND_INITIALIZER(); > > - struct timeout barrier; > > + mtx_enter(_mutex); > > + > > + if (timeout_del_locked(to)) { > > + mtx_leave(_mutex); > > + return 1; > > + } > > > > + if (needsproc) { > > timeout_set_proc(, timeout_proc_barrier, ); > > barrier.to_process = curproc->p_p; > > - > > - mtx_enter(_mutex); > > SET(barrier.to_flags, TIMEOUT_ONQUEUE); > > CIRCQ_INSERT_TAIL(_proc, _list); > > mtx_leave(_mutex); > > - > > wakeup_one(_proc); > > - > > cond_wait(, "tmobar"); > > + } else { > > + mtx_leave(_mutex); > > + /* XXX Is this in the right spot? */ > > + splx(splsoftclock()); > > + while (timeout_running == to) > > + CPU_BUSY_CYCLE(); > > Won't splx() will execute the soft-interrupt if there's any pending? > Shouldn't the barrier be before? Could you add `spc->spc_spinning++' > around the spinning loop? What happen if two threads call > timeout_del_barrier(9) with the same argument at the same time? Is > it possible and/or supported? the timeout passed to timeout_barrier is only used to figure out which context the barrier should apply to, ie, it's used to pick between the softint or proc queue runners. like the other barriers in other parts of the kernel, it's just supposed to wait for the current work to finish running. it is unfortunate that the timeout_barrier man page isn't very clear. it's worth reading the manpage for intr_barrier and taskq_barrier. sched_barrier is a thing that exists too, but isn't documented. also have a look at the EXAMPLE in the cond_wait(9) manpage too. however, don't read the taskq_barrier code. timeout_barrier is like intr_barrier in that it uses the argument to find out where work runs, but again, it doesn't care about the specific
Re: timeout_del_barrier(9): remove kernel lock
On 04/05/21(Tue) 01:10, Scott Cheloha wrote: > [...] > I want to run softclock() without the kernel lock. The way to go, I > think, is to first push the kernel lock down into timeout_run(), and > then to remove the kernel lock from each timeout, one by one. Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates more latency than running all timeouts in a batch after having grabbed the KERNEL_LOCK(). I doubt this is the best way forward. > Before we can push the kernel lock down into timeout_run() we need to > remove the kernel lock from timeout_del_barrier(9). Seems worth it on its own. > The kernel lock is used in timeout_del_barrier(9) to determine whether > the given timeout has stopped running. Because the softclock() runs > with the kernel lock we currently assume that once the calling thread > has taken the kernel lock any onging softclock() must have returned > and relinquished the lock, so the timeout in question has returned. So you want to stop using the KERNEL_LOCK() to do the serialization? > The simplest replacement I can think of is a volatile pointer to the > running timeout that we set before leaving the timeout_mutex and clear > after reentering the same during timeout_run(). Sounds like a condition variable protected by this mutex. Interesting that cond_wait(9) doesn't work with a mutex. > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just > spins until the timeout function returns and the timeout_running > pointer is changed. Not every caller can sleep during > timeout_del_barrier(9). I think spinning is the simplest thing that > will definitely work here. This keeps the current semantic indeed. > -void > -timeout_barrier(struct timeout *to) > +int > +timeout_del_barrier(struct timeout *to) > { > + struct timeout barrier; > + struct cond c = COND_INITIALIZER(); > int needsproc = ISSET(to->to_flags, TIMEOUT_PROC); > > timeout_sync_order(needsproc); > > - if (!needsproc) { > - KERNEL_LOCK(); > - splx(splsoftclock()); > - KERNEL_UNLOCK(); > - } else { > - struct cond c = COND_INITIALIZER(); > - struct timeout barrier; > + mtx_enter(_mutex); > + > + if (timeout_del_locked(to)) { > + mtx_leave(_mutex); > + return 1; > + } > > + if (needsproc) { > timeout_set_proc(, timeout_proc_barrier, ); > barrier.to_process = curproc->p_p; > - > - mtx_enter(_mutex); > SET(barrier.to_flags, TIMEOUT_ONQUEUE); > CIRCQ_INSERT_TAIL(_proc, _list); > mtx_leave(_mutex); > - > wakeup_one(_proc); > - > cond_wait(, "tmobar"); > + } else { > + mtx_leave(_mutex); > + /* XXX Is this in the right spot? */ > + splx(splsoftclock()); > + while (timeout_running == to) > + CPU_BUSY_CYCLE(); Won't splx() will execute the soft-interrupt if there's any pending? Shouldn't the barrier be before? Could you add `spc->spc_spinning++' around the spinning loop? What happen if two threads call timeout_del_barrier(9) with the same argument at the same time? Is it possible and/or supported?
timeout_del_barrier(9): remove kernel lock
Hi, I want to run softclock() without the kernel lock. The way to go, I think, is to first push the kernel lock down into timeout_run(), and then to remove the kernel lock from each timeout, one by one. Before we can push the kernel lock down into timeout_run() we need to remove the kernel lock from timeout_del_barrier(9). The kernel lock is used in timeout_del_barrier(9) to determine whether the given timeout has stopped running. Because the softclock() runs with the kernel lock we currently assume that once the calling thread has taken the kernel lock any onging softclock() must have returned and relinquished the lock, so the timeout in question has returned. The simplest replacement I can think of is a volatile pointer to the running timeout that we set before leaving the timeout_mutex and clear after reentering the same during timeout_run(). So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just spins until the timeout function returns and the timeout_running pointer is changed. Not every caller can sleep during timeout_del_barrier(9). I think spinning is the simplest thing that will definitely work here. There is no behavior change in the TIMEOUT_PROC case. The kernel lock was never involved in that path. Misc: - The timeout_barrier(9) routine has one caller. Can we just fold it into timeout_del_barrier(9) or do we need to keep it separate for API completeness? See patch. - Assuming it's okay to fold timeout_barrier() into timeout_del_barrier(): I have merged two separate timeout_mutex sections into one. Unclear if I still need a second timeout_sync_order() call before returning from timeout_del_barrier(). - Where, if anywhere, does the splx(splsoftclock()) call belong in the non-TIMEOUT_PROC case? In the current code it is between KERNEL_LOCK() and KERNEL_UNLOCK(). There is no equivalent spot in the new code. (Manpage changes omitted for the moment to keep the diff short.) Thoughts? Index: kern/kern_timeout.c === RCS file: /cvs/src/sys/kern/kern_timeout.c,v retrieving revision 1.83 diff -u -p -r1.83 kern_timeout.c --- kern/kern_timeout.c 8 Feb 2021 08:18:45 - 1.83 +++ kern/kern_timeout.c 4 May 2021 06:05:42 - @@ -79,6 +79,8 @@ struct circq timeout_proc;/* [T] Due + time_t timeout_level_width[WHEELCOUNT];/* [I] Wheel level width (seconds) */ struct timespec tick_ts; /* [I] Length of a tick (1/hz secs) */ +volatile struct timeout *timeout_running; /* [T] Running timeout, if any */ + struct kclock { struct timespec kc_lastscan;/* [T] Clock time at last wheel scan */ struct timespec kc_late;/* [T] Late if due prior */ @@ -173,6 +175,7 @@ void softclock_process_kclock_timeout(st void softclock_process_tick_timeout(struct timeout *, int); void softclock_thread(void *); uint32_t timeout_bucket(const struct timeout *); +int timeout_del_locked(struct timeout *); uint32_t timeout_maskwheel(uint32_t, const struct timespec *); void timeout_run(struct timeout *); void timeout_proc_barrier(void *); @@ -455,11 +458,12 @@ kclock_nanotime(int kclock, struct times } int -timeout_del(struct timeout *to) +timeout_del_locked(struct timeout *to) { int ret = 0; - mtx_enter(_mutex); + MUTEX_ASSERT_LOCKED(_mutex); + if (ISSET(to->to_flags, TIMEOUT_ONQUEUE)) { CIRCQ_REMOVE(>to_list); CLR(to->to_flags, TIMEOUT_ONQUEUE); @@ -468,52 +472,54 @@ timeout_del(struct timeout *to) } CLR(to->to_flags, TIMEOUT_TRIGGERED); tostat.tos_deleted++; - mtx_leave(_mutex); return ret; } int -timeout_del_barrier(struct timeout *to) +timeout_del(struct timeout *to) { - int removed; + int status; - timeout_sync_order(ISSET(to->to_flags, TIMEOUT_PROC)); - - removed = timeout_del(to); - if (!removed) - timeout_barrier(to); - - return removed; + mtx_enter(_mutex); + status = timeout_del_locked(to); + mtx_leave(_mutex); + return status; } -void -timeout_barrier(struct timeout *to) +int +timeout_del_barrier(struct timeout *to) { + struct timeout barrier; + struct cond c = COND_INITIALIZER(); int needsproc = ISSET(to->to_flags, TIMEOUT_PROC); timeout_sync_order(needsproc); - if (!needsproc) { - KERNEL_LOCK(); - splx(splsoftclock()); - KERNEL_UNLOCK(); - } else { - struct cond c = COND_INITIALIZER(); - struct timeout barrier; + mtx_enter(_mutex); + + if (timeout_del_locked(to)) { + mtx_leave(_mutex); + return 1; + } + if (needsproc) { timeout_set_proc(, timeout_proc_barrier, ); barrier.to_process = curproc->p_p; - - mtx_enter(_mutex);