On Wed, Jul 02, 2014 at 10:30:56AM +0800, Yuyang Du wrote:
> The idea of per entity runnable load average (aggregated to cfs_rq and 
> task_group load)
> was proposed by Paul Turner, and it is still followed by this rewrite. But 
> this rewrite
> is made due to the following ends:
> 
> (1). cfs_rq's load average (namely runnable_load_avg and blocked_load_avg) is 
> updated
> incrementally by one entity at one time, which means the cfs_rq load average 
> is only
> partially updated or asynchronous accross its entities (the entity in 
> question is up
> to date and contributes to the cfs_rq, but all other entities are effectively 
> lagging
> behind).
> 
> (2). cfs_rq load average is different between top rq->cfs_rq and task_group's 
> per CPU
> cfs_rqs in whether or not blocked_load_average contributes to the load.

ISTR there was a reason for it; can't remember though, maybe pjt/ben can
remember.

> (3). How task_group's load is tracked is very confusing and complex.
> 
> Therefore, this rewrite tackles these by:
> 
> (1). Combine runnable and blocked load averages for cfs_rq. And track 
> cfs_rq's load average
> as a whole (contributed by all runnabled and blocked entities on this cfs_rq).
> 
> (2). Only track task load average. Do not track task_group's per CPU entity 
> average, but
> track that entity's own cfs_rq's aggregated average.
> 
> This rewrite resutls in significantly reduced codes and expected consistency 
> and clarity.
> Also, if draw the lines of previous cfs_rq runnable_load_avg and 
> blocked_load_avg and the
> new rewritten load_avg, then compare those lines, you can see the new 
> load_avg is much
> more continuous (no abrupt jumping ups and downs) and decayed/updated more 
> quickly and
> synchronously.

OK, maybe seeing what you're doing. I worry about a fwe things though:

> +static inline void synchronize_tg_load_avg(struct cfs_rq *cfs_rq, u32 old)
>  {
> +       s32 delta = cfs_rq->avg.load_avg - old;
>  
> +       if (delta)
> +               atomic_long_add(delta, &cfs_rq->tg->load_avg);
>  }

That tg->load_avg cacheline is already red hot glowing, and you've just
increased the amount of updates to it.. That's not going to be pleasant.


> +static inline void enqueue_entity_load_avg(struct sched_entity *se)
>  {
> +     struct sched_avg *sa = &se->avg;
> +     struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +     u64 now = cfs_rq_clock_task(cfs_rq);
> +     u32 old_load_avg = cfs_rq->avg.load_avg;
> +     int migrated = 0;
>  
> +     if (entity_is_task(se)) {
> +             if (sa->last_update_time == 0) {
> +                     sa->last_update_time = now;
> +                     migrated = 1;
>               }
> +             else
> +                     __update_load_avg(now, sa, se->on_rq * se->load.weight);
>       }
>  
> +     __update_load_avg(now, &cfs_rq->avg, cfs_rq->load.weight);
>  
> +     if (migrated)
> +             cfs_rq->avg.load_avg += sa->load_avg;
>  
> +     synchronize_tg_load_avg(cfs_rq, old_load_avg);
>  }

So here you add the task to the cfs_rq avg when its got migrate in,
however:

> @@ -4552,17 +4326,9 @@ migrate_task_rq_fair(struct task_struct *p, int 
> next_cpu)
>       struct sched_entity *se = &p->se;
>       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>  
> +     /* Update task on old CPU, then ready to go (entity must be off the 
> queue) */
> +     __update_load_avg(cfs_rq_clock_task(cfs_rq), &se->avg, 0);
> +     se->avg.last_update_time = 0;
>  
>       /* We have migrated, no longer consider this task hot */
>       se->exec_start = 0;

there you don't remove it first..

Attachment: pgprqLWLtgc1R.pgp
Description: PGP signature

Reply via email to