On 15 September 2016 at 15:11, Dietmar Eggemann <dietmar.eggem...@arm.com> wrote: > On 12/09/16 08:47, Vincent Guittot wrote: >> When a task moves from/to a cfs_rq, we set a flag which is then used to >> propagate the change at parent level (sched_entity and cfs_rq) during >> next update. If the cfs_rq is throttled, the flag will stay pending until >> the cfs_rw is unthrottled. >> >> For propagating the utilization, we copy the utilization of child cfs_rq to > > s/child/group ? > >> the sched_entity. >> >> For propagating the load, we have to take into account the load of the >> whole task group in order to evaluate the load of the sched_entity. >> Similarly to what was done before the rewrite of PELT, we add a correction >> factor in case the task group's load is less than its share so it will >> contribute the same load of a task of equal weight. > > What about cfs_rq->runnable_load_avg?
sched_entity's load is updated before being enqueued so the up to date value will be added to cfs_rq->runnable_load_avg... Unless se is already enqueued ... so cfs_rq->runnable_load_avg should also be updated is se is already on_rq. I'm going to add this case Thanks for pointing this case > > [...] > >> +/* Take into account change of load of a child task group */ >> +static inline void >> +update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se) >> +{ >> + struct cfs_rq *gcfs_rq = group_cfs_rq(se); >> + long delta, load = gcfs_rq->avg.load_avg; >> + >> + /* If the load of group cfs_rq is null, the load of the >> + * sched_entity will also be null so we can skip the formula >> + */ >> + if (load) { >> + long tg_load; >> + >> + /* Get tg's load and ensure tg_load > 0 */ >> + tg_load = atomic_long_read(&gcfs_rq->tg->load_avg) + 1; >> + >> + /* Ensure tg_load >= load and updated with current load*/ >> + tg_load -= gcfs_rq->tg_load_avg_contrib; >> + tg_load += load; >> + >> + /* scale gcfs_rq's load into tg's shares*/ >> + load *= scale_load_down(gcfs_rq->tg->shares); >> + load /= tg_load; >> + >> + /* >> + * we need to compute a correction term in the case that the >> + * task group is consuming <1 cpu so that we would contribute >> + * the same load as a task of equal weight. > > Wasn't 'consuming <1' related to 'NICE_0_LOAD' and not > scale_load_down(gcfs_rq->tg->shares) before the rewrite of PELT (v4.2, > __update_group_entity_contrib())? Yes before the rewrite, the condition (tg->runnable_avg < NICE_0_LOAD) was used. I have used the following examples to choose the condition: A task group with only one always running task TA with a weight equals to tg->shares, will have a tg's load (cfs_rq->tg->load_avg) equals to TA's weight == scale_load_down(tg->shares): The load of the CPU on which the task runs, will be scale_load_down(task's weight) == scale_load_down(tg->shares) and the load of others CPUs will be null. In this case, all shares will be given to cfs_rq CFS1 on which TA runs and the load of the sched_entity SB that represents CFS1 at parent level will be scale_load_down(SB's weight) = scale_load_down(tg->shares). If the TA is not an always running task, its load will be less than its weight and less than scale_load_down(tg->shares) and as a result tg->load_avg will be less than scale_load_down(tg->shares). Nevertheless, the weight of SB is still scale_load_down(tg->shares) and its load should be the same as TA. But the 1st part of the calculation gives a load of scale_load_down(gcfs_rq->tg->shares) because tg_load == gcfs_rq->tg_load_avg_contrib == load. So if tg_load < scale_load_down(gcfs_rq->tg->shares), we have to correct the load that we set to SEB > >> + */ >> + if (tg_load < scale_load_down(gcfs_rq->tg->shares)) { >> + load *= tg_load; >> + load /= scale_load_down(gcfs_rq->tg->shares); >> + } >> + } > > [...]