On 23/09/16 15:30, Vincent Guittot wrote: > Hi Matt, > > On 23 September 2016 at 13:58, Matt Fleming <[email protected]> wrote: >> Since commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new >> tasks") ::last_update_time will be set to a non-zero value in >> post_init_entity_util_avg(), which leads to p->se.avg.load_avg being >> decayed on enqueue before the task has even had a chance to run. >> >> For a NICE_0 task the sequence of events leading up to this with >> example load average changes might be, >> >> sched_fork() >> init_entity_runnable_average() >> p->se.avg.load_avg = scale_load_down(se->load.weight); // 1024 >> >> wake_up_new_task() >> post_init_entity_util_avg() >> attach_entity_load_avg() >> p->se.last_update_time = cfs_rq->avg.last_update_time; >> >> activate_task() >> enqueue_task() >> ... >> enqueue_entity_load_avg() >> migrated = !sa->last_update_time // false >> if (!migrated) >> __update_load_avg() >> p->se.avg.load_avg = 1002 > > Does it mean that you can see the perf drop that you mention below > because load is decayed to 1002 instead of staying to 1024 ?
I think Matt is talking about the fact that the cfs->runnable_load_avg value is 0 once the hackbench task is initially dequeued. Without this patch the value of se->avg.load_avg (e.g. both times 1002) is exactly the same when we add it to cfs_rq->runnable_load_avg in enqueue_entity_load_avg() and when we subtract it in dequeue_entity_load_avg(). That's because the initial runtime is short (~250us on my hikey board). With this patch we add 1024 and subtract ~1002 which lets cfs_rq->runnable_load_avg still have a small positive value. This favours that for the next hackbench task another cpu will be chosen in (load-based) fork-balance. > > 1002 mainly comes from period_contrib being set to 1023 during > init_entity_runnable_average so any delay longer than 1us between > attach_entity_load_avg and enqueue_entity_load_avg will trig the decay > of the load from 1024 to 1002 > [...]

