On Fri, Sep 11, 2020 at 01:17:02PM +0100, Valentin Schneider wrote: > On 11/09/20 09:17, Peter Zijlstra wrote: > > The intent of balance_callback() has always been to delay executing > > balancing operations until the end of the current rq->lock section. > > This is because balance operations must often drop rq->lock, and that > > isn't safe in general. > > > > However, as noted by Scott, there were a few holes in that scheme; > > balance_callback() was called after rq->lock was dropped, which means > > another CPU can interleave and touch the callback list. > > > > So that can be say __schedule() tail racing with some setprio; what's the > worst that can (currently) happen here? Something like say two consecutive > enqueuing of push_rt_tasks() to the callback list?
Yeah, but that isn't in fact the case I worry most about. What can happen (and what I've spotted once before) is that someone attempts to enqueue a balance_callback from a rq->lock region that doesn't handle the calls. Currently that 'works', that is, it will get ran _eventually_. But ideally we'd want that to not work and issue a WARN. We want the callbacks to be timely. So basically all of these machinations we in order to add the WARN :-)