fair: Optimize __update_sched_avg()

Dietmar Eggemann Mon, 10 Apr 2017 00:39:33 -0700


On 31/03/17 12:23, Peter Zijlstra wrote:
> On Fri, Mar 31, 2017 at 02:58:57AM -0700, Paul Turner wrote:
>>> So lets pull it out again -- but I don't think we need to undo all of
>>> yuyang's patches for that. So please, look at the patch I proposed for
>>> the problem you spotted. Lets fix the current state and take it from
>>> there, ok?
>>
>> Oh, absolutely.  I'm not really not proposing re-vulcanizing any
>> rubber here.  Just saying that this particular problem is just one
>> facet of the weight mixing.  100% agree on fixing this as is and
>> iterating.  I was actually thinking that these changes (which really
>> nicely simplify the calculation) greatly simplify restoring the
>> separation there.
> 
> So on top of the previous patch; I've arrived (without testing and
> considering numerical overflow much) at the following patch.


I did some testing (performance gov, no big.little) with this and for a
single task the load_avg values are much higher now on 64bit because of
the missing scale_load_down on weight.
Are you planning to get rid of scale_load_down() for weight because of
the correctness problems of the signal mentioned by Paul in this thread?

"... c) It doesn't work with scale_load_down and fractional shares below
SCHED_LOAD_SCALE [we multiply in a zero -> zero rq load] ... "

> 
> Its known broken for things like update_cfs_rq_load_avg() where doing
> this will unconditionally flatten load_sum, which seems undesired. Need
> to think more on this.
> 
> Also, I've pulled out the cpufreq scaling, which I'm not sure is the
> right thing to do. Vincent wants to scale time, which would be
> persistent in the history, unlike this now.

This patch moves freq and cpu from accumulate_sum() to ___update_load_avg()?

[...]

> @@ -2820,15 +2820,11 @@ static u32 __accumulate_pelt_segments(u6
>   */
>  static __always_inline u32
>  accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
> -            unsigned long weight, int running, struct cfs_rq *cfs_rq)
> +            bool load, bool running, struct cfs_rq *cfs_rq)
>  {
> -     unsigned long scale_freq, scale_cpu;
> -     u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
> +     u32 contrib = (u32)delta;
>       u64 periods;
>  
> -     scale_freq = arch_scale_freq_capacity(NULL, cpu);
> -     scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> -
>       delta += sa->period_contrib;
>       periods = delta / 1024; /* A period is 1024us (~1ms) */
>  
> @@ -2852,14 +2848,13 @@ accumulate_sum(u64 delta, int cpu, struc
>       }
>       sa->period_contrib = delta;
>  
> -     contrib = cap_scale(contrib, scale_freq);
> -     if (weight) {
> -             sa->load_sum += weight * contrib;
> +     if (load) {
> +             sa->load_sum += contrib;
>               if (cfs_rq)
> -                     cfs_rq->runnable_load_sum += weight * contrib;
> +                     cfs_rq->runnable_load_sum += contrib;
>       }
>       if (running)
> -             sa->util_sum += contrib * scale_cpu;
> +             sa->util_sum += contrib;
>  
>       return periods;
>  }
> @@ -2894,9 +2889,10 @@ accumulate_sum(u64 delta, int cpu, struc
>   */
>  static __always_inline int
>  ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> -               unsigned long weight, int running, struct cfs_rq *cfs_rq)
> +               unsigned long weight, bool running, struct cfs_rq *cfs_rq)
>  {
> -     u64 delta;
> +     unsigned long scale_freq, scale_cpu;
> +     u64 delta, load;
>  
>       delta = now - sa->last_update_time;
>       /*
> @@ -2924,18 +2920,22 @@ ___update_load_avg(u64 now, int cpu, str
>        * Step 1: accumulate *_sum since last_update_time. If we haven't
>        * crossed period boundaries, finish.
>        */
> -     if (!accumulate_sum(delta, cpu, sa, weight, running, cfs_rq))
> +     if (!accumulate_sum(delta, cpu, sa, !!weight, running, cfs_rq))
>               return 0;
>  
> +     scale_freq = arch_scale_freq_capacity(NULL, cpu);
> +     scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> +
>       /*
>        * Step 2: update *_avg.
>        */
> -     sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
> +     load = cap_scale(weight, scale_freq);
> +     sa->load_avg = div_u64(load * sa->load_sum, LOAD_AVG_MAX);
>       if (cfs_rq) {
>               cfs_rq->runnable_load_avg =
> -                     div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
> +                     div_u64(load * cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
>       }
> -     sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
> +     sa->util_avg = (cap_scale(scale_cpu, scale_freq) * sa->util_sum) / 
> LOAD_AVG_MAX;
>  
>       return 1;
>  }

[...]

Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg()

Reply via email to