Re: timers: Move clearing of base::timer_running under base::lock
On 2020-12-11 15:36:27 [+0100], Thomas Gleixner wrote: > So the change at hand does not make things worse, right? Correct. This *ping*-*pong* was there before this patch because ->running_timer is always cleared after the callback finishes. I was just curious why out of a sudden there are *that* many users of the corner case. > Thanks, > > tglx Sebastian
Re: timers: Move clearing of base::timer_running under base::lock
On Tue, Dec 08 2020 at 09:50, Sebastian Andrzej Siewior wrote: > On 2020-12-07 08:06:48 [-0800], Paul E. McKenney wrote: >> > Yes, but it triggers frequently. Like `rcuc' is somehow is aligned with >> > the timeout. >> >> Given that a lot of RCU processing is event-driven based on timers, >> and given that the scheduling-clock interrupts are synchronized for >> energy-efficiency reasons on many configs, maybe this alignment is >> expected behavior? > > No, it is the fact that rcu_preempt has a higher priority than > ksoftirqd. So immediately after the wakeup (of rcu_preempt) there is a > context switch and expire_timers() has this: > > | raw_spin_unlock_irq(&base->lock); > | call_timer_fn(timer, fn, baseclk); > | raw_spin_lock_irq(&base->lock); > | base->running_timer = NULL; > | timer_sync_wait_running(base); > > So ->running_timer isn't reset and try_to_del_timer_sync() (that > del_timer_sync() from schedule_timeout()) returns -1 and then the corner > case is handled where `expiry_lock' is acquired. So everything goes as > expected. Well, but even without that change you have the same situation: timer_fn() wakeup() -->preemption del_timer_sync() if (running) wait_for_running() lock(expiry) running = NULL sync_wait_running() unlock(expiry) wakeup_lock() -->preemption ... lock(base) So the change at hand does not make things worse, right? Thanks, tglx
Re: timers: Move clearing of base::timer_running under base::lock
On Tue, Dec 08, 2020 at 09:50:49AM +0100, Sebastian Andrzej Siewior wrote: > On 2020-12-07 08:06:48 [-0800], Paul E. McKenney wrote: > > > Yes, but it triggers frequently. Like `rcuc' is somehow is aligned with > > > the timeout. > > > > Given that a lot of RCU processing is event-driven based on timers, > > and given that the scheduling-clock interrupts are synchronized for > > energy-efficiency reasons on many configs, maybe this alignment is > > expected behavior? > > No, it is the fact that rcu_preempt has a higher priority than > ksoftirqd. So immediately after the wakeup (of rcu_preempt) there is a > context switch and expire_timers() has this: > > | raw_spin_unlock_irq(&base->lock); > | call_timer_fn(timer, fn, baseclk); > | raw_spin_lock_irq(&base->lock); > | base->running_timer = NULL; > | timer_sync_wait_running(base); > > So ->running_timer isn't reset and try_to_del_timer_sync() (that > del_timer_sync() from schedule_timeout()) returns -1 and then the corner > case is handled where `expiry_lock' is acquired. So everything goes as > expected. Makes sense! Thank you for the explanation! Thanx, Paul
Re: timers: Move clearing of base::timer_running under base::lock
On 2020-12-07 08:06:48 [-0800], Paul E. McKenney wrote: > > Yes, but it triggers frequently. Like `rcuc' is somehow is aligned with > > the timeout. > > Given that a lot of RCU processing is event-driven based on timers, > and given that the scheduling-clock interrupts are synchronized for > energy-efficiency reasons on many configs, maybe this alignment is > expected behavior? No, it is the fact that rcu_preempt has a higher priority than ksoftirqd. So immediately after the wakeup (of rcu_preempt) there is a context switch and expire_timers() has this: | raw_spin_unlock_irq(&base->lock); | call_timer_fn(timer, fn, baseclk); | raw_spin_lock_irq(&base->lock); | base->running_timer = NULL; | timer_sync_wait_running(base); So ->running_timer isn't reset and try_to_del_timer_sync() (that del_timer_sync() from schedule_timeout()) returns -1 and then the corner case is handled where `expiry_lock' is acquired. So everything goes as expected. > Thanx, Paul Sebastian
Re: timers: Move clearing of base::timer_running under base::lock
On Mon, Dec 07, 2020 at 04:25:33PM +0100, Sebastian Andrzej Siewior wrote: > On 2020-12-07 15:29:50 [+0100], Thomas Gleixner wrote: > > On Mon, Dec 07 2020 at 14:07, Sebastian Andrzej Siewior wrote: > > > One thing I noticed while testing it is that the "corner" case in > > > timer_sync_wait_running() is quite reliably hit by rcu_preempt > > > rcu_gp_fqs_loop() -> swait_event_idle_timeout_exclusive() invocation. > > > > I assume it's something like this: > > > > timeout -> wakeup > > > > ->preemption > > del_timer_sync() > > . > > Yes, but it triggers frequently. Like `rcuc' is somehow is aligned with > the timeout. Given that a lot of RCU processing is event-driven based on timers, and given that the scheduling-clock interrupts are synchronized for energy-efficiency reasons on many configs, maybe this alignment is expected behavior? Thanx, Paul > | -0 [007] dN.h4..46.299705: sched_wakeup: > comm=rcuc/7 pid=53 prio=98 target_cpu=007 > | -0 [007] d...2..46.299728: sched_switch: > prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> > next_comm=rcuc/7 next_pid=53 next_prio=98 > | rcuc/7-53 [007] d...2..46.299742: sched_switch: > prev_comm=rcuc/7 prev_pid=53 prev_prio=98 prev_state=S ==> > next_comm=ksoftirqd/7 next_pid=54 next_prio=120 > | ksoftirqd/7-54 [007] .1346.299750: timer_expire_entry: > timer=3bd1e045 function=process_timeout now=4294903802 > baseclk=4294903802 > | ksoftirqd/7-54 [007] d...21346.299750: sched_waking: > comm=rcu_preempt pid=11 prio=98 target_cpu=007 > | ksoftirqd/7-54 [007] dN..31346.299754: sched_wakeup: > comm=rcu_preempt pid=11 prio=98 target_cpu=007 > | ksoftirqd/7-54 [007] dN..21346.299756: sched_stat_runtime: > comm=ksoftirqd/7 pid=54 runtime=13265 [ns] vruntime=3012610540 [ns] > | ksoftirqd/7-54 [007] d...21346.299760: sched_switch: > prev_comm=ksoftirqd/7 prev_pid=54 prev_prio=120 prev_state=R+ ==> > next_comm=rcu_preempt next_pid=11 next_prio=98 > | rcu_preempt-11 [007] d...31146.299766: sched_pi_setprio: > comm=ksoftirqd/7 pid=54 oldprio=120 newprio=98 > del_timer_sync() > | rcu_preempt-11 [007] d...21146.299773: sched_switch: > prev_comm=rcu_preempt prev_pid=11 prev_prio=98 prev_state=R+ ==> > next_comm=ksoftirqd/7 next_pid=54 next_prio=98 > | ksoftirqd/7-54 [007] .1346.299774: timer_expire_exit: > timer=3bd1e045 > | ksoftirqd/7-54 [007] dN..31146.299784: sched_pi_setprio: > comm=ksoftirqd/7 pid=54 oldprio=98 newprio=120 > | ksoftirqd/7-54 [007] dN..31146.299788: sched_waking: > comm=rcu_preempt pid=11 prio=98 target_cpu=007 > | ksoftirqd/7-54 [007] dN..41146.299790: sched_wakeup: > comm=rcu_preempt pid=11 prio=98 target_cpu=007 > | ksoftirqd/7-54 [007] dN..31146.299792: sched_stat_runtime: > comm=ksoftirqd/7 pid=54 runtime=7404 [ns] vruntime=3012617944 [ns] > | ksoftirqd/7-54 [007] d...2..46.299797: sched_switch: > prev_comm=ksoftirqd/7 prev_pid=54 prev_prio=120 prev_state=S ==> > next_comm=rcu_preempt next_pid=11 next_prio=98 > > > > Thanks, > > > > tglx > > Sebastian
Re: timers: Move clearing of base::timer_running under base::lock
On 2020-12-07 15:29:50 [+0100], Thomas Gleixner wrote: > On Mon, Dec 07 2020 at 14:07, Sebastian Andrzej Siewior wrote: > > One thing I noticed while testing it is that the "corner" case in > > timer_sync_wait_running() is quite reliably hit by rcu_preempt > > rcu_gp_fqs_loop() -> swait_event_idle_timeout_exclusive() invocation. > > I assume it's something like this: > > timeout -> wakeup > > ->preemption > del_timer_sync() > . Yes, but it triggers frequently. Like `rcuc' is somehow is aligned with the timeout. | -0 [007] dN.h4..46.299705: sched_wakeup: comm=rcuc/7 pid=53 prio=98 target_cpu=007 | -0 [007] d...2..46.299728: sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcuc/7 next_pid=53 next_prio=98 | rcuc/7-53 [007] d...2..46.299742: sched_switch: prev_comm=rcuc/7 prev_pid=53 prev_prio=98 prev_state=S ==> next_comm=ksoftirqd/7 next_pid=54 next_prio=120 | ksoftirqd/7-54 [007] .1346.299750: timer_expire_entry: timer=3bd1e045 function=process_timeout now=4294903802 baseclk=4294903802 | ksoftirqd/7-54 [007] d...21346.299750: sched_waking: comm=rcu_preempt pid=11 prio=98 target_cpu=007 | ksoftirqd/7-54 [007] dN..31346.299754: sched_wakeup: comm=rcu_preempt pid=11 prio=98 target_cpu=007 | ksoftirqd/7-54 [007] dN..21346.299756: sched_stat_runtime: comm=ksoftirqd/7 pid=54 runtime=13265 [ns] vruntime=3012610540 [ns] | ksoftirqd/7-54 [007] d...21346.299760: sched_switch: prev_comm=ksoftirqd/7 prev_pid=54 prev_prio=120 prev_state=R+ ==> next_comm=rcu_preempt next_pid=11 next_prio=98 | rcu_preempt-11 [007] d...31146.299766: sched_pi_setprio: comm=ksoftirqd/7 pid=54 oldprio=120 newprio=98 del_timer_sync() | rcu_preempt-11 [007] d...21146.299773: sched_switch: prev_comm=rcu_preempt prev_pid=11 prev_prio=98 prev_state=R+ ==> next_comm=ksoftirqd/7 next_pid=54 next_prio=98 | ksoftirqd/7-54 [007] .1346.299774: timer_expire_exit: timer=3bd1e045 | ksoftirqd/7-54 [007] dN..31146.299784: sched_pi_setprio: comm=ksoftirqd/7 pid=54 oldprio=98 newprio=120 | ksoftirqd/7-54 [007] dN..31146.299788: sched_waking: comm=rcu_preempt pid=11 prio=98 target_cpu=007 | ksoftirqd/7-54 [007] dN..41146.299790: sched_wakeup: comm=rcu_preempt pid=11 prio=98 target_cpu=007 | ksoftirqd/7-54 [007] dN..31146.299792: sched_stat_runtime: comm=ksoftirqd/7 pid=54 runtime=7404 [ns] vruntime=3012617944 [ns] | ksoftirqd/7-54 [007] d...2..46.299797: sched_switch: prev_comm=ksoftirqd/7 prev_pid=54 prev_prio=120 prev_state=S ==> next_comm=rcu_preempt next_pid=11 next_prio=98 > Thanks, > > tglx Sebastian
Re: timers: Move clearing of base::timer_running under base::lock
On Mon, Dec 07 2020 at 14:07, Sebastian Andrzej Siewior wrote: > On 2020-12-06 22:40:07 [+0100], Thomas Gleixner wrote: >> syzbot reported KCSAN data races vs. timer_base::timer_running being set to >> NULL without holding base::lock in expire_timers(). >> >> This looks innocent and most reads are clearly not problematic but for a >> non-RT kernel it's completely irrelevant whether the store happens before >> or after taking the lock. For an RT kernel moving the store under the lock >> requires an extra unlock/lock pair in the case that there is a waiter for >> the timer. But that's not the end of the world and definitely not worth the >> trouble of adding boatloads of comments and annotations to the code. Famous >> last words... >> >> Reported-by: syzbot+aa7c2385d46c5eba0...@syzkaller.appspotmail.com >> Reported-by: syzbot+abea4558531bae1ba...@syzkaller.appspotmail.com >> Signed-off-by: Thomas Gleixner > > One thing I noticed while testing it is that the "corner" case in > timer_sync_wait_running() is quite reliably hit by rcu_preempt > rcu_gp_fqs_loop() -> swait_event_idle_timeout_exclusive() invocation. I assume it's something like this: timeout -> wakeup ->preemption del_timer_sync() . Thanks, tglx
Re: timers: Move clearing of base::timer_running under base::lock
On 2020-12-06 22:40:07 [+0100], Thomas Gleixner wrote: > syzbot reported KCSAN data races vs. timer_base::timer_running being set to > NULL without holding base::lock in expire_timers(). > > This looks innocent and most reads are clearly not problematic but for a > non-RT kernel it's completely irrelevant whether the store happens before > or after taking the lock. For an RT kernel moving the store under the lock > requires an extra unlock/lock pair in the case that there is a waiter for > the timer. But that's not the end of the world and definitely not worth the > trouble of adding boatloads of comments and annotations to the code. Famous > last words... > > Reported-by: syzbot+aa7c2385d46c5eba0...@syzkaller.appspotmail.com > Reported-by: syzbot+abea4558531bae1ba...@syzkaller.appspotmail.com > Signed-off-by: Thomas Gleixner One thing I noticed while testing it is that the "corner" case in timer_sync_wait_running() is quite reliably hit by rcu_preempt rcu_gp_fqs_loop() -> swait_event_idle_timeout_exclusive() invocation. Sebastian
Re: timers: Move clearing of base::timer_running under base::lock
On Mon, Dec 07, 2020 at 01:25:13PM +0100, Peter Zijlstra wrote: > On Mon, Dec 07, 2020 at 02:10:13AM +0100, Frederic Weisbecker wrote: > > On Sun, Dec 06, 2020 at 10:40:07PM +0100, Thomas Gleixner wrote: > > > syzbot reported KCSAN data races vs. timer_base::timer_running being set > > > to > > > NULL without holding base::lock in expire_timers(). > > > > > > This looks innocent and most reads are clearly not problematic but for a > > > non-RT kernel it's completely irrelevant whether the store happens before > > > or after taking the lock. For an RT kernel moving the store under the lock > > > requires an extra unlock/lock pair in the case that there is a waiter for > > > the timer. But that's not the end of the world and definitely not worth > > > the > > > trouble of adding boatloads of comments and annotations to the code. > > > Famous > > > last words... > > > > There is another thing I noticed lately wrt. del_timer_sync() VS timer > > execution: > > > Here if the timer has previously executed on CPU 1 and then CPU 0 sees > > base->running_timer == NULL, > > it will return, assuming the timer has completed. But there is nothing to > > enforce the fact that x > > will be equal to 1. Enforcing that is a behaviour I would expect in this > > case since this is a kind > > of "wait for completion" function. But perhaps it doesn't apply here, in > > fact I have no idea... > > > > But if we recognize that as an issue, we would need a mirroring > > load_acquire()/store_release() on > > base->running_timer. > > Yeah, I think you're right. del_timer_sync() explicitly states it waits > for completion of the handler, so it isn't weird to then also expect to > be able to observe the results of the handler. > > Thomas' patch fixes this. Indeed! Thanks.
Re: timers: Move clearing of base::timer_running under base::lock
On Mon, Dec 07, 2020 at 02:10:13AM +0100, Frederic Weisbecker wrote: > On Sun, Dec 06, 2020 at 10:40:07PM +0100, Thomas Gleixner wrote: > > syzbot reported KCSAN data races vs. timer_base::timer_running being set to > > NULL without holding base::lock in expire_timers(). > > > > This looks innocent and most reads are clearly not problematic but for a > > non-RT kernel it's completely irrelevant whether the store happens before > > or after taking the lock. For an RT kernel moving the store under the lock > > requires an extra unlock/lock pair in the case that there is a waiter for > > the timer. But that's not the end of the world and definitely not worth the > > trouble of adding boatloads of comments and annotations to the code. Famous > > last words... > > There is another thing I noticed lately wrt. del_timer_sync() VS timer > execution: > Here if the timer has previously executed on CPU 1 and then CPU 0 sees > base->running_timer == NULL, > it will return, assuming the timer has completed. But there is nothing to > enforce the fact that x > will be equal to 1. Enforcing that is a behaviour I would expect in this case > since this is a kind > of "wait for completion" function. But perhaps it doesn't apply here, in fact > I have no idea... > > But if we recognize that as an issue, we would need a mirroring > load_acquire()/store_release() on > base->running_timer. Yeah, I think you're right. del_timer_sync() explicitly states it waits for completion of the handler, so it isn't weird to then also expect to be able to observe the results of the handler. Thomas' patch fixes this.
Re: timers: Move clearing of base::timer_running under base::lock
On Sun, Dec 06, 2020 at 10:40:07PM +0100, Thomas Gleixner wrote: > syzbot reported KCSAN data races vs. timer_base::timer_running being set to > NULL without holding base::lock in expire_timers(). > > This looks innocent and most reads are clearly not problematic but for a > non-RT kernel it's completely irrelevant whether the store happens before > or after taking the lock. For an RT kernel moving the store under the lock > requires an extra unlock/lock pair in the case that there is a waiter for > the timer. But that's not the end of the world and definitely not worth the > trouble of adding boatloads of comments and annotations to the code. Famous > last words... There is another thing I noticed lately wrt. del_timer_sync() VS timer execution: int data = 0; void timer_func(struct timer_list *t) { data = 1; } CPU 0 CPU 1 -- -- base = lock_timer_base(timer, &flags); raw_spin_unlock(&base->lock); if (base->running_timer != timer) call_timer_fn(timer, fn, baseclk); ret = detach_if_pending(timer, base, true); base->running_timer = NULL; raw_spin_unlock_irqrestore(&base->lock, flags); raw_spin_lock(&base->lock); x = data; Here if the timer has previously executed on CPU 1 and then CPU 0 sees base->running_timer == NULL, it will return, assuming the timer has completed. But there is nothing to enforce the fact that x will be equal to 1. Enforcing that is a behaviour I would expect in this case since this is a kind of "wait for completion" function. But perhaps it doesn't apply here, in fact I have no idea... But if we recognize that as an issue, we would need a mirroring load_acquire()/store_release() on base->running_timer.