Replying to the latest version available; given the current interest I figure I'd re-read some of the old threads and look at this stuff again.
On Fri, Apr 28, 2017 at 04:23:55PM +0200, Vincent Guittot wrote: > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 0978fb7..f8dde36 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -313,6 +313,7 @@ struct load_weight { > */ > struct sched_avg { > u64 last_update_time; > + u64 stolen_idle_time; > u64 load_sum; > u32 util_sum; > u32 period_contrib; Right, so sadly Patrick stole that space with the util_est bits. Also, given the comment here: https://marc.info/?l=linux-kernel&m=149373232422941&w=2 this should be a u32, right? Which might be slightly easier finding a hole for. > /* > + * Scale the time to reflect the effective amount of computation done during > + * this delta time. I would much appreciate a more extended comment here. One that includes pictures of the of the moving window edges, as in: https://marc.info/?l=linux-kernel&m=149200866116792&w=2 https://marc.info/?l=linux-kernel&m=149201190517985&w=2 > + */ > +static __always_inline u64 > +scale_time(u64 delta, int cpu, struct sched_avg *sa, > + unsigned long weight, int running) > +{ > + if (running) { > + /* > + * When an entity runs at a lower compute capacity, it will > + * need more time to do the same amount of work than at max > + * capacity. In order to be invariant, we scale the delta to > + * reflect how much work has been really done. > + * Running at lower capacity also means running longer to do > + * the same amount of work and this results in stealing some > + * idle time that will disturbed the load signal compared to > + * max capacity; We also track this amount of stolen time to > + * reflect it when the entity will go back to sleep. > + * > + * stolen time = (current run time) - (effective time at max > + * capacity) > + */ > + sa->stolen_idle_time += delta; > + > + /* > + * scale the elapsed time to reflect the real amount of > + * computation > + */ > + delta = cap_scale(delta, arch_scale_freq_capacity(NULL, cpu)); > + delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu)); > + > + /* > + * Track the amount of stolen idle time due to running at > + * lower capacity > + */ > + sa->stolen_idle_time -= delta; > + } else if (!weight) { > + /* > + * Entity is sleeping so both utilization and load will decay > + * and we can safely add the stolen time. Reflecting some > + * stolen time make sense only if this idle phase would be > + * present at max capacity. As soon as the utilization of an > + * entity has reached the maximum value, it is considered as > + * an always runnnig entity without idle time to steal. > + */ > + if (sa->util_avg < (SCHED_CAPACITY_SCALE - 1)) { > + /* > + * Add the idle time stolen by running at lower compute > + * capacity > + */ > + delta += sa->stolen_idle_time; > + } > + sa->stolen_idle_time = 0; > + } What happened to the proposed changes here: https://marc.info/?l=linux-kernel&m=149383148721909&w=2 to deal with the load scaling issues? > + > + return delta; > +}