On 23/08/16 21:40, Paul Turner wrote: > On Mon, Aug 22, 2016 at 7:00 AM, Dietmar Eggemann
[...] >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 61d485421bed..18f80c4c7737 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -2530,8 +2530,8 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, >> struct task_group *tg) >> if (tg_weight) >> shares /= tg_weight; >> >> - if (shares < MIN_SHARES) >> - shares = MIN_SHARES; >> + if (shares < scale_load(MIN_SHARES)) >> + shares = scale_load(MIN_SHARES); >> if (shares > tg->shares) >> shares = tg->shares; > > > MIN_SHARES is never scaled; it is an independent floor on the value > that can be assigned as a weight, so we never need to scale it down > (this would actually allow the weight to drop to zero which would be > bad). True but I'm using scale_load() and not scale_load_down() here. I was under the impression that we have different floor values for the two fixed point arithmetics now. It's already used in sched_group_set_shares() today, so I thought this lower floor value is 2048 instead of 2 for 20 bit and we should use the same value in calc_cfs_shares(). sched_group_set_shares() shares = clamp(shares, scale_load(MIN_SHARES), scale_load(MAX_SHARES)); ... tg->shares = shares; >> @@ -5023,9 +5023,9 @@ static long effective_load(struct task_group *tg, int >> cpu, long wl, long wg) >> * wl = S * s'_i; see (2) >> */ >> if (W > 0 && w < W) >> - wl = (w * (long)tg->shares) / W; >> + wl = (w * (long)scale_load_down(tg->shares)) / W; >> else >> - wl = tg->shares; >> + wl = scale_load_down(tg->shares); > > > This part looks good, effective_load() works with > source_load/target_load values, which originate in unscaled values. Yes. > > Independent of this patch: > When we initially introduced load scaling, it was ~uniform on every > value. Most of the current pain has come from, and will continue to > come from, that with v2 of the load-tracking this is no longer the > case. We have a massive number of scaled and unscaled inputs floating > around, many of them derived values (e.g. source_load above) which > require chasing. > > I propose we simplify this. Load scaling is only here so that the > load-balancer can make sane decisions. We only need > cfs_rq->load.weight values to be scaled. I see, i.e. no scaling of tg->shares any more. > > This means: > (1) scaling in calculating group se->se.weight (calc_cfs_shares) > (2) probable scaling in h_load calculations > (2) if you have a calculation involving a cfs_rq->load.weight value, > you may need to think about scaling. > [There's a bunch of obvious places this covers, most of them > are the load-balancer. There's some indirects, e.g. the removed need > to scale in calculating vruntime, but these are easy to audit just by > searching for existing calls to scale.] > > Probably missed one, but the fact that this list can be written means > its 1000 pages shorter than today's requirements. > > The fact that (3) becomes the only rule to remember for the most part > makes reasoning about all of this stuff possible again because right > now it sucks.