On 09/18/2018 04:35 PM, Rik van Riel wrote: > On Tue, 2018-09-18 at 15:22 +0200, Jan H. Schönherr wrote: [...] > Task priorities in a flat runqueue are relatively straightforward, with > vruntime scaling just like done for nice levels, but I have to admit > that throttled groups provide a challenge.
Do you know/have an idea how the flat approach would further skew the approximations currently done in calc_group_shares()/calc_group_runnable()? With the nested hierarchy the (shared) task group SE is updated whenever something changes. With the flat approach, you'd only be able to update a task SE, when you have to touch the task anyway. Just from thinking briefly about it, it feels like values would be out of date for longer periods of time. >> However, the only efficient way that I can currently think of, is a >> hybrid model between the "full nesting" that is currently there, and >> the "no nesting" you were describing above. >> >> It would flatten all task groups that do not actively contribute some >> function, which would be all task groups purely for accounting >> purposes and those for *unthrottled* CFS hierarchies (and those for >> coscheduling that contain exactly one SE in a runqueue). The nesting >> would still be kept for *throttled* hierarchies (and the coscheduling >> stuff). (And if you wouldn't have mentioned a way to get rid of >> nesting completely, I would have kept a single level of nesting for >> accounting purposes as well.) >> >> This would allow us to lazily dequeue SEs that have run out of >> bandwidth when we encounter them, and already enqueue them in the >> nested task group (whose SE is not enqueued at the moment). That way, >> it's still a O(1) operation to re-enable all tasks, once runtime is >> available again. And O(1) to throttle a repeat offender. > > I suspect most systems will have a number of runnable tasks no larger > than the number of CPUs most of the time. > > That makes "re-enable all the tasks" often equivalent to "re-enable one > task". > > Can we handle the re-enabling (or waking up!) of one task almost as fast > as we can without the cpu controller? We could start by transparently special-casing the "just one SE in a runqueue" case, where that single SE is enqueued directly into the next parent, and everything falls back to nesting, the moment a second SE pops up. That might allow reaping the benefits for many cases without hurting other cases. It's also more a gradual conversion of code. Regards Jan