Re: timers: Move clearing of base::timer_running under base::lock

2020-12-11 Thread Sebastian Andrzej Siewior
On 2020-12-11 15:36:27 [+0100], Thomas Gleixner wrote:
> So the change at hand does not make things worse, right?

Correct. This *ping*-*pong* was there before this patch because
->running_timer is always cleared after the callback finishes.

I was just curious why out of a sudden there are *that* many users of
the corner case. 

> Thanks,
> 
> tglx

Sebastian


Re: timers: Move clearing of base::timer_running under base::lock

2020-12-11 Thread Thomas Gleixner
On Tue, Dec 08 2020 at 09:50, Sebastian Andrzej Siewior wrote:
> On 2020-12-07 08:06:48 [-0800], Paul E. McKenney wrote:
>> > Yes, but it triggers frequently. Like `rcuc' is somehow is aligned with
>> > the timeout.
>> 
>> Given that a lot of RCU processing is event-driven based on timers,
>> and given that the scheduling-clock interrupts are synchronized for
>> energy-efficiency reasons on many configs, maybe this alignment is
>> expected behavior?
>
> No, it is the fact that rcu_preempt has a higher priority than
> ksoftirqd. So immediately after the wakeup (of rcu_preempt) there is a
> context switch and expire_timers() has this:
>
> |   raw_spin_unlock_irq(&base->lock);
> |   call_timer_fn(timer, fn, baseclk);
> |   raw_spin_lock_irq(&base->lock);
> |   base->running_timer = NULL;
> |   timer_sync_wait_running(base);
>
> So ->running_timer isn't reset and try_to_del_timer_sync() (that
> del_timer_sync() from schedule_timeout()) returns -1 and then the corner
> case is handled where `expiry_lock' is acquired. So everything goes as
> expected.

Well, but even without that change you have the same situation:

  timer_fn()
wakeup()
  -->preemption
del_timer_sync()
  if (running)
 wait_for_running()
   lock(expiry)

 running = NULL
 sync_wait_running()
   unlock(expiry)
 wakeup_lock()
  -->preemption
 ...

lock(base)
 
So the change at hand does not make things worse, right?

Thanks,

tglx


Re: timers: Move clearing of base::timer_running under base::lock

2020-12-08 Thread Paul E. McKenney
On Tue, Dec 08, 2020 at 09:50:49AM +0100, Sebastian Andrzej Siewior wrote:
> On 2020-12-07 08:06:48 [-0800], Paul E. McKenney wrote:
> > > Yes, but it triggers frequently. Like `rcuc' is somehow is aligned with
> > > the timeout.
> > 
> > Given that a lot of RCU processing is event-driven based on timers,
> > and given that the scheduling-clock interrupts are synchronized for
> > energy-efficiency reasons on many configs, maybe this alignment is
> > expected behavior?
> 
> No, it is the fact that rcu_preempt has a higher priority than
> ksoftirqd. So immediately after the wakeup (of rcu_preempt) there is a
> context switch and expire_timers() has this:
> 
> |   raw_spin_unlock_irq(&base->lock);
> |   call_timer_fn(timer, fn, baseclk);
> |   raw_spin_lock_irq(&base->lock);
> |   base->running_timer = NULL;
> |   timer_sync_wait_running(base);
> 
> So ->running_timer isn't reset and try_to_del_timer_sync() (that
> del_timer_sync() from schedule_timeout()) returns -1 and then the corner
> case is handled where `expiry_lock' is acquired. So everything goes as
> expected.

Makes sense!  Thank you for the explanation!

Thanx, Paul


Re: timers: Move clearing of base::timer_running under base::lock

2020-12-08 Thread Sebastian Andrzej Siewior
On 2020-12-07 08:06:48 [-0800], Paul E. McKenney wrote:
> > Yes, but it triggers frequently. Like `rcuc' is somehow is aligned with
> > the timeout.
> 
> Given that a lot of RCU processing is event-driven based on timers,
> and given that the scheduling-clock interrupts are synchronized for
> energy-efficiency reasons on many configs, maybe this alignment is
> expected behavior?

No, it is the fact that rcu_preempt has a higher priority than
ksoftirqd. So immediately after the wakeup (of rcu_preempt) there is a
context switch and expire_timers() has this:

|   raw_spin_unlock_irq(&base->lock);
|   call_timer_fn(timer, fn, baseclk);
|   raw_spin_lock_irq(&base->lock);
|   base->running_timer = NULL;
|   timer_sync_wait_running(base);

So ->running_timer isn't reset and try_to_del_timer_sync() (that
del_timer_sync() from schedule_timeout()) returns -1 and then the corner
case is handled where `expiry_lock' is acquired. So everything goes as
expected.

>   Thanx, Paul

Sebastian


Re: timers: Move clearing of base::timer_running under base::lock

2020-12-07 Thread Paul E. McKenney
On Mon, Dec 07, 2020 at 04:25:33PM +0100, Sebastian Andrzej Siewior wrote:
> On 2020-12-07 15:29:50 [+0100], Thomas Gleixner wrote:
> > On Mon, Dec 07 2020 at 14:07, Sebastian Andrzej Siewior wrote:
> > > One thing I noticed while testing it is that the "corner" case in
> > > timer_sync_wait_running() is quite reliably hit by rcu_preempt
> > > rcu_gp_fqs_loop() -> swait_event_idle_timeout_exclusive() invocation.
> > 
> > I assume it's something like this:
> > 
> >  timeout -> wakeup
> > 
> > ->preemption
> > del_timer_sync()
> > .
> 
> Yes, but it triggers frequently. Like `rcuc' is somehow is aligned with
> the timeout.

Given that a lot of RCU processing is event-driven based on timers,
and given that the scheduling-clock interrupts are synchronized for
energy-efficiency reasons on many configs, maybe this alignment is
expected behavior?

Thanx, Paul

> |  -0   [007] dN.h4..46.299705: sched_wakeup: 
> comm=rcuc/7 pid=53 prio=98 target_cpu=007
> |  -0   [007] d...2..46.299728: sched_switch: 
> prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> 
> next_comm=rcuc/7 next_pid=53 next_prio=98
> |  rcuc/7-53  [007] d...2..46.299742: sched_switch: 
> prev_comm=rcuc/7 prev_pid=53 prev_prio=98 prev_state=S ==> 
> next_comm=ksoftirqd/7 next_pid=54 next_prio=120
> | ksoftirqd/7-54  [007] .1346.299750: timer_expire_entry: 
> timer=3bd1e045 function=process_timeout now=4294903802 
> baseclk=4294903802
> | ksoftirqd/7-54  [007] d...21346.299750: sched_waking: 
> comm=rcu_preempt pid=11 prio=98 target_cpu=007
> | ksoftirqd/7-54  [007] dN..31346.299754: sched_wakeup: 
> comm=rcu_preempt pid=11 prio=98 target_cpu=007
> | ksoftirqd/7-54  [007] dN..21346.299756: sched_stat_runtime: 
> comm=ksoftirqd/7 pid=54 runtime=13265 [ns] vruntime=3012610540 [ns]
> | ksoftirqd/7-54  [007] d...21346.299760: sched_switch: 
> prev_comm=ksoftirqd/7 prev_pid=54 prev_prio=120 prev_state=R+ ==> 
> next_comm=rcu_preempt next_pid=11 next_prio=98
> | rcu_preempt-11  [007] d...31146.299766: sched_pi_setprio: 
> comm=ksoftirqd/7 pid=54 oldprio=120 newprio=98
> del_timer_sync()
> | rcu_preempt-11  [007] d...21146.299773: sched_switch: 
> prev_comm=rcu_preempt prev_pid=11 prev_prio=98 prev_state=R+ ==> 
> next_comm=ksoftirqd/7 next_pid=54 next_prio=98
> | ksoftirqd/7-54  [007] .1346.299774: timer_expire_exit: 
> timer=3bd1e045
> | ksoftirqd/7-54  [007] dN..31146.299784: sched_pi_setprio: 
> comm=ksoftirqd/7 pid=54 oldprio=98 newprio=120
> | ksoftirqd/7-54  [007] dN..31146.299788: sched_waking: 
> comm=rcu_preempt pid=11 prio=98 target_cpu=007
> | ksoftirqd/7-54  [007] dN..41146.299790: sched_wakeup: 
> comm=rcu_preempt pid=11 prio=98 target_cpu=007
> | ksoftirqd/7-54  [007] dN..31146.299792: sched_stat_runtime: 
> comm=ksoftirqd/7 pid=54 runtime=7404 [ns] vruntime=3012617944 [ns]
> | ksoftirqd/7-54  [007] d...2..46.299797: sched_switch: 
> prev_comm=ksoftirqd/7 prev_pid=54 prev_prio=120 prev_state=S ==> 
> next_comm=rcu_preempt next_pid=11 next_prio=98
> 
> 
> > Thanks,
> > 
> > tglx
> 
> Sebastian


Re: timers: Move clearing of base::timer_running under base::lock

2020-12-07 Thread Sebastian Andrzej Siewior
On 2020-12-07 15:29:50 [+0100], Thomas Gleixner wrote:
> On Mon, Dec 07 2020 at 14:07, Sebastian Andrzej Siewior wrote:
> > One thing I noticed while testing it is that the "corner" case in
> > timer_sync_wait_running() is quite reliably hit by rcu_preempt
> > rcu_gp_fqs_loop() -> swait_event_idle_timeout_exclusive() invocation.
> 
> I assume it's something like this:
> 
>  timeout -> wakeup
> 
> ->preemption
> del_timer_sync()
> .

Yes, but it triggers frequently. Like `rcuc' is somehow is aligned with
the timeout.

|  -0   [007] dN.h4..46.299705: sched_wakeup: comm=rcuc/7 
pid=53 prio=98 target_cpu=007
|  -0   [007] d...2..46.299728: sched_switch: 
prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcuc/7 
next_pid=53 next_prio=98
|  rcuc/7-53  [007] d...2..46.299742: sched_switch: 
prev_comm=rcuc/7 prev_pid=53 prev_prio=98 prev_state=S ==> 
next_comm=ksoftirqd/7 next_pid=54 next_prio=120
| ksoftirqd/7-54  [007] .1346.299750: timer_expire_entry: 
timer=3bd1e045 function=process_timeout now=4294903802 
baseclk=4294903802
| ksoftirqd/7-54  [007] d...21346.299750: sched_waking: 
comm=rcu_preempt pid=11 prio=98 target_cpu=007
| ksoftirqd/7-54  [007] dN..31346.299754: sched_wakeup: 
comm=rcu_preempt pid=11 prio=98 target_cpu=007
| ksoftirqd/7-54  [007] dN..21346.299756: sched_stat_runtime: 
comm=ksoftirqd/7 pid=54 runtime=13265 [ns] vruntime=3012610540 [ns]
| ksoftirqd/7-54  [007] d...21346.299760: sched_switch: 
prev_comm=ksoftirqd/7 prev_pid=54 prev_prio=120 prev_state=R+ ==> 
next_comm=rcu_preempt next_pid=11 next_prio=98
| rcu_preempt-11  [007] d...31146.299766: sched_pi_setprio: 
comm=ksoftirqd/7 pid=54 oldprio=120 newprio=98
del_timer_sync()
| rcu_preempt-11  [007] d...21146.299773: sched_switch: 
prev_comm=rcu_preempt prev_pid=11 prev_prio=98 prev_state=R+ ==> 
next_comm=ksoftirqd/7 next_pid=54 next_prio=98
| ksoftirqd/7-54  [007] .1346.299774: timer_expire_exit: 
timer=3bd1e045
| ksoftirqd/7-54  [007] dN..31146.299784: sched_pi_setprio: 
comm=ksoftirqd/7 pid=54 oldprio=98 newprio=120
| ksoftirqd/7-54  [007] dN..31146.299788: sched_waking: 
comm=rcu_preempt pid=11 prio=98 target_cpu=007
| ksoftirqd/7-54  [007] dN..41146.299790: sched_wakeup: 
comm=rcu_preempt pid=11 prio=98 target_cpu=007
| ksoftirqd/7-54  [007] dN..31146.299792: sched_stat_runtime: 
comm=ksoftirqd/7 pid=54 runtime=7404 [ns] vruntime=3012617944 [ns]
| ksoftirqd/7-54  [007] d...2..46.299797: sched_switch: 
prev_comm=ksoftirqd/7 prev_pid=54 prev_prio=120 prev_state=S ==> 
next_comm=rcu_preempt next_pid=11 next_prio=98


> Thanks,
> 
> tglx

Sebastian


Re: timers: Move clearing of base::timer_running under base::lock

2020-12-07 Thread Thomas Gleixner
On Mon, Dec 07 2020 at 14:07, Sebastian Andrzej Siewior wrote:
> On 2020-12-06 22:40:07 [+0100], Thomas Gleixner wrote:
>> syzbot reported KCSAN data races vs. timer_base::timer_running being set to
>> NULL without holding base::lock in expire_timers().
>> 
>> This looks innocent and most reads are clearly not problematic but for a
>> non-RT kernel it's completely irrelevant whether the store happens before
>> or after taking the lock. For an RT kernel moving the store under the lock
>> requires an extra unlock/lock pair in the case that there is a waiter for
>> the timer. But that's not the end of the world and definitely not worth the
>> trouble of adding boatloads of comments and annotations to the code. Famous
>> last words...
>> 
>> Reported-by: syzbot+aa7c2385d46c5eba0...@syzkaller.appspotmail.com
>> Reported-by: syzbot+abea4558531bae1ba...@syzkaller.appspotmail.com
>> Signed-off-by: Thomas Gleixner 
>
> One thing I noticed while testing it is that the "corner" case in
> timer_sync_wait_running() is quite reliably hit by rcu_preempt
> rcu_gp_fqs_loop() -> swait_event_idle_timeout_exclusive() invocation.

I assume it's something like this:

 timeout -> wakeup

->preemption
del_timer_sync()
.

Thanks,

tglx


Re: timers: Move clearing of base::timer_running under base::lock

2020-12-07 Thread Sebastian Andrzej Siewior
On 2020-12-06 22:40:07 [+0100], Thomas Gleixner wrote:
> syzbot reported KCSAN data races vs. timer_base::timer_running being set to
> NULL without holding base::lock in expire_timers().
> 
> This looks innocent and most reads are clearly not problematic but for a
> non-RT kernel it's completely irrelevant whether the store happens before
> or after taking the lock. For an RT kernel moving the store under the lock
> requires an extra unlock/lock pair in the case that there is a waiter for
> the timer. But that's not the end of the world and definitely not worth the
> trouble of adding boatloads of comments and annotations to the code. Famous
> last words...
> 
> Reported-by: syzbot+aa7c2385d46c5eba0...@syzkaller.appspotmail.com
> Reported-by: syzbot+abea4558531bae1ba...@syzkaller.appspotmail.com
> Signed-off-by: Thomas Gleixner 

One thing I noticed while testing it is that the "corner" case in
timer_sync_wait_running() is quite reliably hit by rcu_preempt
rcu_gp_fqs_loop() -> swait_event_idle_timeout_exclusive() invocation.

Sebastian


Re: timers: Move clearing of base::timer_running under base::lock

2020-12-07 Thread Frederic Weisbecker
On Mon, Dec 07, 2020 at 01:25:13PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 07, 2020 at 02:10:13AM +0100, Frederic Weisbecker wrote:
> > On Sun, Dec 06, 2020 at 10:40:07PM +0100, Thomas Gleixner wrote:
> > > syzbot reported KCSAN data races vs. timer_base::timer_running being set 
> > > to
> > > NULL without holding base::lock in expire_timers().
> > > 
> > > This looks innocent and most reads are clearly not problematic but for a
> > > non-RT kernel it's completely irrelevant whether the store happens before
> > > or after taking the lock. For an RT kernel moving the store under the lock
> > > requires an extra unlock/lock pair in the case that there is a waiter for
> > > the timer. But that's not the end of the world and definitely not worth 
> > > the
> > > trouble of adding boatloads of comments and annotations to the code. 
> > > Famous
> > > last words...
> > 
> > There is another thing I noticed lately wrt. del_timer_sync() VS timer 
> > execution:
> 
> > Here if the timer has previously executed on CPU 1 and then CPU 0 sees 
> > base->running_timer == NULL,
> > it will return, assuming the timer has completed. But there is nothing to 
> > enforce the fact that x
> > will be equal to 1. Enforcing that is a behaviour I would expect in this 
> > case since this is a kind
> > of "wait for completion" function. But perhaps it doesn't apply here, in 
> > fact I have no idea...
> > 
> > But if we recognize that as an issue, we would need a mirroring 
> > load_acquire()/store_release() on
> > base->running_timer.
> 
> Yeah, I think you're right. del_timer_sync() explicitly states it waits
> for completion of the handler, so it isn't weird to then also expect to
> be able to observe the results of the handler.
> 
> Thomas' patch fixes this.

Indeed!

Thanks.


Re: timers: Move clearing of base::timer_running under base::lock

2020-12-07 Thread Peter Zijlstra
On Mon, Dec 07, 2020 at 02:10:13AM +0100, Frederic Weisbecker wrote:
> On Sun, Dec 06, 2020 at 10:40:07PM +0100, Thomas Gleixner wrote:
> > syzbot reported KCSAN data races vs. timer_base::timer_running being set to
> > NULL without holding base::lock in expire_timers().
> > 
> > This looks innocent and most reads are clearly not problematic but for a
> > non-RT kernel it's completely irrelevant whether the store happens before
> > or after taking the lock. For an RT kernel moving the store under the lock
> > requires an extra unlock/lock pair in the case that there is a waiter for
> > the timer. But that's not the end of the world and definitely not worth the
> > trouble of adding boatloads of comments and annotations to the code. Famous
> > last words...
> 
> There is another thing I noticed lately wrt. del_timer_sync() VS timer 
> execution:

> Here if the timer has previously executed on CPU 1 and then CPU 0 sees 
> base->running_timer == NULL,
> it will return, assuming the timer has completed. But there is nothing to 
> enforce the fact that x
> will be equal to 1. Enforcing that is a behaviour I would expect in this case 
> since this is a kind
> of "wait for completion" function. But perhaps it doesn't apply here, in fact 
> I have no idea...
> 
> But if we recognize that as an issue, we would need a mirroring 
> load_acquire()/store_release() on
> base->running_timer.

Yeah, I think you're right. del_timer_sync() explicitly states it waits
for completion of the handler, so it isn't weird to then also expect to
be able to observe the results of the handler.

Thomas' patch fixes this.


Re: timers: Move clearing of base::timer_running under base::lock

2020-12-06 Thread Frederic Weisbecker
On Sun, Dec 06, 2020 at 10:40:07PM +0100, Thomas Gleixner wrote:
> syzbot reported KCSAN data races vs. timer_base::timer_running being set to
> NULL without holding base::lock in expire_timers().
> 
> This looks innocent and most reads are clearly not problematic but for a
> non-RT kernel it's completely irrelevant whether the store happens before
> or after taking the lock. For an RT kernel moving the store under the lock
> requires an extra unlock/lock pair in the case that there is a waiter for
> the timer. But that's not the end of the world and definitely not worth the
> trouble of adding boatloads of comments and annotations to the code. Famous
> last words...

There is another thing I noticed lately wrt. del_timer_sync() VS timer 
execution:


int data = 0;

void timer_func(struct timer_list *t)
{
data = 1;
}

 CPU 0   CPU 1
-- 
--
base = lock_timer_base(timer, &flags); 
raw_spin_unlock(&base->lock);
if (base->running_timer != timer)  
call_timer_fn(timer, fn, baseclk);
ret = detach_if_pending(timer, base, true);
base->running_timer = NULL;
raw_spin_unlock_irqrestore(&base->lock, flags);
raw_spin_lock(&base->lock);

x = data;


Here if the timer has previously executed on CPU 1 and then CPU 0 sees 
base->running_timer == NULL,
it will return, assuming the timer has completed. But there is nothing to 
enforce the fact that x
will be equal to 1. Enforcing that is a behaviour I would expect in this case 
since this is a kind
of "wait for completion" function. But perhaps it doesn't apply here, in fact I 
have no idea...

But if we recognize that as an issue, we would need a mirroring 
load_acquire()/store_release() on
base->running_timer.