On 04/02/16 09:54, Juri Lelli wrote: > Hi Steve, > > first of all thanks a lot for your detailed report, if only all bug > reports were like this.. :) > > On 03/02/16 13:55, Steven Rostedt wrote:
[...] > > Right. I think this is the same thing that happens after hotplug. IIRC > the code paths are actually the same. The problem is that hotplug or > cpuset reconfiguration operations are destructive w.r.t. root_domains, > so we lose bandwidth information when that happens. The problem is that > we only store cumulative information regarding bandwidth in root_domain, > while information about which task belongs to which cpuset is store in > cpuset data structures. > > I tried to fix this a while back, but my tentative was broken, I failed > to get locking right and, even though it seemed to fix the issue for me, > it was prone to race conditions. You might still want to have a look at > that for reference: https://lkml.org/lkml/2015/9/2/162 > [...] > > It's good that we can recover, but that's still a bug yes :/. > > I'll try to see if my broken patch make what you are seeing apparently > disappear, so that we can at least confirm that we are seeing the same > problem; you could do the same if you want, I pushed that here > No it doesn't solve this :/. I placed restoring code in the hotplug workfn, so updates generated by toggling sched_load_balance don't get caught, of course. But, this at least tells us that we should solve this someplace else. Best, - Juri