On Wed, Jul 02, 2014 at 10:30:56AM +0800, Yuyang Du wrote: > The idea of per entity runnable load average (aggregated to cfs_rq and > task_group load) > was proposed by Paul Turner, and it is still followed by this rewrite. But > this rewrite > is made due to the following ends: > > (1). cfs_rq's load average (namely runnable_load_avg and blocked_load_avg) is > updated > incrementally by one entity at one time, which means the cfs_rq load average > is only > partially updated or asynchronous accross its entities (the entity in > question is up > to date and contributes to the cfs_rq, but all other entities are effectively > lagging > behind). > > (2). cfs_rq load average is different between top rq->cfs_rq and task_group's > per CPU > cfs_rqs in whether or not blocked_load_average contributes to the load.
ISTR there was a reason for it; can't remember though, maybe pjt/ben can
remember.
> (3). How task_group's load is tracked is very confusing and complex.
>
> Therefore, this rewrite tackles these by:
>
> (1). Combine runnable and blocked load averages for cfs_rq. And track
> cfs_rq's load average
> as a whole (contributed by all runnabled and blocked entities on this cfs_rq).
>
> (2). Only track task load average. Do not track task_group's per CPU entity
> average, but
> track that entity's own cfs_rq's aggregated average.
>
> This rewrite resutls in significantly reduced codes and expected consistency
> and clarity.
> Also, if draw the lines of previous cfs_rq runnable_load_avg and
> blocked_load_avg and the
> new rewritten load_avg, then compare those lines, you can see the new
> load_avg is much
> more continuous (no abrupt jumping ups and downs) and decayed/updated more
> quickly and
> synchronously.
OK, maybe seeing what you're doing. I worry about a fwe things though:
> +static inline void synchronize_tg_load_avg(struct cfs_rq *cfs_rq, u32 old)
> {
> + s32 delta = cfs_rq->avg.load_avg - old;
>
> + if (delta)
> + atomic_long_add(delta, &cfs_rq->tg->load_avg);
> }
That tg->load_avg cacheline is already red hot glowing, and you've just
increased the amount of updates to it.. That's not going to be pleasant.
> +static inline void enqueue_entity_load_avg(struct sched_entity *se)
> {
> + struct sched_avg *sa = &se->avg;
> + struct cfs_rq *cfs_rq = cfs_rq_of(se);
> + u64 now = cfs_rq_clock_task(cfs_rq);
> + u32 old_load_avg = cfs_rq->avg.load_avg;
> + int migrated = 0;
>
> + if (entity_is_task(se)) {
> + if (sa->last_update_time == 0) {
> + sa->last_update_time = now;
> + migrated = 1;
> }
> + else
> + __update_load_avg(now, sa, se->on_rq * se->load.weight);
> }
>
> + __update_load_avg(now, &cfs_rq->avg, cfs_rq->load.weight);
>
> + if (migrated)
> + cfs_rq->avg.load_avg += sa->load_avg;
>
> + synchronize_tg_load_avg(cfs_rq, old_load_avg);
> }
So here you add the task to the cfs_rq avg when its got migrate in,
however:
> @@ -4552,17 +4326,9 @@ migrate_task_rq_fair(struct task_struct *p, int
> next_cpu)
> struct sched_entity *se = &p->se;
> struct cfs_rq *cfs_rq = cfs_rq_of(se);
>
> + /* Update task on old CPU, then ready to go (entity must be off the
> queue) */
> + __update_load_avg(cfs_rq_clock_task(cfs_rq), &se->avg, 0);
> + se->avg.last_update_time = 0;
>
> /* We have migrated, no longer consider this task hot */
> se->exec_start = 0;
there you don't remove it first..
pgprqLWLtgc1R.pgp
Description: PGP signature

