Task being dequeued for the last time (state == TASK_DEAD) are dequeued with the DEQUEUE_SLEEP flag which causes their load and utilization contributions to be added to the runqueue blocked load and utilization. Hence they will contain load or utilization that is gone away. The issue only exists for the root cfs_rq as cgroup_exit() doesn't set DEQUEUE_SLEEP for task group exits.
If runnable+blocked load is to be used as a better estimate for cpu load the dead task contributions need to be removed to prevent load_balance() (idle_balance() in particular) from over-estimating the cpu load. cc: Ingo Molnar <mi...@redhat.com> cc: Peter Zijlstra <pet...@infradead.org> Signed-off-by: Morten Rasmussen <morten.rasmus...@arm.com> --- kernel/sched/fair.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e40cd88..d045404 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3202,6 +3202,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) * Update run-time statistics of the 'current'. */ update_curr(cfs_rq); + if (entity_is_task(se) && task_of(se)->state == TASK_DEAD) + flags &= !DEQUEUE_SLEEP; dequeue_entity_load_avg(cfs_rq, se, flags & DEQUEUE_SLEEP); update_stats_dequeue(cfs_rq, se); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/