Hi Yuyang,

On Tue, Jun 16, 2015 at 03:26:05AM +0800, Yuyang Du wrote:
> @@ -5977,36 +5786,6 @@ static void attach_tasks(struct lb_env *env)
>  }
>  
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> -/*
> - * update tg->load_weight by folding this cpu's load_avg
> - */
> -static void __update_blocked_averages_cpu(struct task_group *tg, int cpu)
> -{
> -     struct sched_entity *se = tg->se[cpu];
> -     struct cfs_rq *cfs_rq = tg->cfs_rq[cpu];
> -
> -     /* throttled entities do not contribute to load */
> -     if (throttled_hierarchy(cfs_rq))
> -             return;
> -
> -     update_cfs_rq_blocked_load(cfs_rq, 1);
> -
> -     if (se) {
> -             update_entity_load_avg(se, 1);
> -             /*
> -              * We pivot on our runnable average having decayed to zero for
> -              * list removal.  This generally implies that all our children
> -              * have also been removed (modulo rounding error or bandwidth
> -              * control); however, such cases are rare and we can fix these
> -              * at enqueue.
> -              *
> -              * TODO: fix up out-of-order children on enqueue.
> -              */
> -             if (!se->avg.runnable_avg_sum && !cfs_rq->nr_running)
> -                     list_del_leaf_cfs_rq(cfs_rq);
> -     }
> -}
> -
>  static void update_blocked_averages(int cpu)
>  {
>       struct rq *rq = cpu_rq(cpu);
> @@ -6015,17 +5794,17 @@ static void update_blocked_averages(int cpu)
>  
>       raw_spin_lock_irqsave(&rq->lock, flags);
>       update_rq_clock(rq);
> +
>       /*
>        * Iterates the task_group tree in a bottom up fashion, see
>        * list_add_leaf_cfs_rq() for details.
>        */
>       for_each_leaf_cfs_rq(rq, cfs_rq) {
> -             /*
> -              * Note: We may want to consider periodically releasing
> -              * rq->lock about these updates so that creating many task
> -              * groups does not result in continually extending hold time.
> -              */
> -             __update_blocked_averages_cpu(cfs_rq->tg, rq->cpu);
> +             /* throttled entities do not contribute to load */
> +             if (throttled_hierarchy(cfs_rq))
> +                     continue;
> +
> +             update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);

We iterates task_group tree(actually the corresponding cfs_rq tree on
that cpu), because we want to do a bottom-up update. And we want a
bottom-up update, because we want to update the weigthed_cpuload().

__update_blocked_averages_cpu(tg, cpu) does three things:

Let's say:
cfs_rq = tg->cfs_rq[cpu]
se = tg->cfs_rq[cpu]
pcfs_rq = cfs_rq_of(se)  ,which is the parent of cfs_rq.

1. update cfs_rq->blocked_load_avg, and its contrib to its task group.
2. update se->avg and calculate the deltas of se->avg.*_avg_contrib.
3. update pcfs_rq->*_load_avg with the deltas in step 2

In this way, __update_blocked_averages_cpu(tg, cpu) can contributes
tg's load changes to its parent. So that update_blocked_averages() can
aggregate all load changes to weighted_cpuload().

However, update_cfs_rq_load_avg() only updates cfs_rq->avg, the change
won't be contributed or aggregated to cfs_rq's parent in the
for_each_leaf_cfs_rq loop, therefore that's actually not a bottom-up
update.

To fix this, I think we can add a update_cfs_shares(cfs_rq) after
update_cfs_rq_load_avg(). Like:

        for_each_leaf_cfs_rq(rq, cfs_rq) {
-               /*
-                * Note: We may want to consider periodically releasing
-                * rq->lock about these updates so that creating many task
-                * groups does not result in continually extending hold time.
-                */
-               __update_blocked_averages_cpu(cfs_rq->tg, rq->cpu);
+               /* throttled entities do not contribute to load */
+               if (throttled_hierarchy(cfs_rq))
+                       continue;
+
+               update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
+               update_cfs_share(cfs_rq);
        }

However, I think update_cfs_share isn't cheap, because it may do a
bottom-up update once called. So how about just update the root cfs_rq?
Like:

-       /*
-        * Iterates the task_group tree in a bottom up fashion, see
-        * list_add_leaf_cfs_rq() for details.
-        */
-       for_each_leaf_cfs_rq(rq, cfs_rq) {
-               /*
-                * Note: We may want to consider periodically releasing
-                * rq->lock about these updates so that creating many task
-                * groups does not result in continually extending hold time.
-                */
-               __update_blocked_averages_cpu(cfs_rq->tg, rq->cpu);
-       }
+       update_cfs_rq_load_avg(rq_clock_task(rq), rq->cfs_rq);

Thanks and Best Regards,
Boqun

Attachment: signature.asc
Description: PGP signature

Reply via email to