On Mon, Dec 07, 2020 at 01:25:13PM +0100, Peter Zijlstra wrote: > On Mon, Dec 07, 2020 at 02:10:13AM +0100, Frederic Weisbecker wrote: > > On Sun, Dec 06, 2020 at 10:40:07PM +0100, Thomas Gleixner wrote: > > > syzbot reported KCSAN data races vs. timer_base::timer_running being set > > > to > > > NULL without holding base::lock in expire_timers(). > > > > > > This looks innocent and most reads are clearly not problematic but for a > > > non-RT kernel it's completely irrelevant whether the store happens before > > > or after taking the lock. For an RT kernel moving the store under the lock > > > requires an extra unlock/lock pair in the case that there is a waiter for > > > the timer. But that's not the end of the world and definitely not worth > > > the > > > trouble of adding boatloads of comments and annotations to the code. > > > Famous > > > last words... > > > > There is another thing I noticed lately wrt. del_timer_sync() VS timer > > execution: > > > Here if the timer has previously executed on CPU 1 and then CPU 0 sees > > base->running_timer == NULL, > > it will return, assuming the timer has completed. But there is nothing to > > enforce the fact that x > > will be equal to 1. Enforcing that is a behaviour I would expect in this > > case since this is a kind > > of "wait for completion" function. But perhaps it doesn't apply here, in > > fact I have no idea... > > > > But if we recognize that as an issue, we would need a mirroring > > load_acquire()/store_release() on > > base->running_timer. > > Yeah, I think you're right. del_timer_sync() explicitly states it waits > for completion of the handler, so it isn't weird to then also expect to > be able to observe the results of the handler. > > Thomas' patch fixes this.
Indeed! Thanks.