On Tue, Oct 18, 2016 at 12:15:11PM +0100, Dietmar Eggemann wrote: > On 18/10/16 10:07, Peter Zijlstra wrote: > > On Mon, Oct 17, 2016 at 11:52:39PM +0100, Dietmar Eggemann wrote:
> > On IRC you mentioned that adding list_add_leaf_cfs_rq() to > > online_fair_sched_group() cures this, this would actually match with > > unregister_fair_sched_group() doing list_del_leaf_cfs_rq() and avoid > > a few instructions on the enqueue path, so that's all good. > > Yes, I was able to recreate a similar problem (not related to the cpu > masks) on ARM64 (6 logical cpus). I created 100 2. level tg's but only > put one task (no cpu affinity, so it could run on multiple cpus) in one > of these tg's (mainly to see the related cfs_rq's in /proc/sched_debug). > > I get a remaining .tg_load_avg : 49898 for cfs_rq[x]:/tg_1 Ah, and since all those CPUs are online, we decay all that load away. OK makes sense now. > > I'm just not immediately seeing how that cures things. The only relevant > > user of the leaf_cfs_rq list seems to be update_blocked_averages() which > > is called from the balance code (idle_balance() and > > rebalance_domains()). But neither should call that for offline (or > > !present) CPUs. > > Assuming this is load from the 99 2. level tg's which never had a task > running, putting list_add_leaf_cfs_rq() into online_fair_sched_group() > for all cpus makes sure that all the 'blocked load' get's decayed. > > Doing what Vincent just suggested, not initializing tg se's w/ 1024 but > w/ 0 instead prevents this from being necessary. Indeed. I just worry about the cases where we do no propagate the load up, eg. the stuff fixed by: 1476695653-12309-5-git-send-email-vincent.guit...@linaro.org If we hit an intermediary cgroup with 0 load, we might get some interactivity issues. But it could be I got lost again :-)