Hi Quentin,

On Thu, Sep 12, 2019 at 11:44:04AM +0200, Quentin Perret wrote:
> From: Quentin Perret <quentin.per...@arm.com>
> 
> EAS computes the energy impact of migrating a waking task when deciding
> on which CPU it should run. However, the current approach is known to
> have a high algorithmic complexity, which can result in prohibitively
> high wake-up latencies on systems with complex energy models, such as
> systems with per-CPU DVFS. On such systems, the algorithm complexity is
> in O(n^2) (ignoring the cost of searching for performance states in the
> EM) with 'n' the number of CPUs.
> 
> To address this, re-factor the EAS wake-up path to compute the energy
> 'delta' (with and without the task) on a per-performance domain basis,
> rather than system-wide, which brings the complexity down to O(n).
> 
> No functional changes intended.
> 
> Signed-off-by: Quentin Perret <quentin.per...@arm.com>
>  

[snip]

>  /*
> @@ -6381,21 +6367,19 @@ compute_energy(struct task_struct *p, int dst_cpu, 
> struct perf_domain *pd)
>   * other use-cases too. So, until someone finds a better way to solve this,
>   * let's keep things simple by re-using the existing slow path.
>   */
> -
>  static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  {
> -     unsigned long prev_energy = ULONG_MAX, best_energy = ULONG_MAX;
> +     unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;
>       struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
> +     unsigned long cpu_cap, util, base_energy = 0;
>       int cpu, best_energy_cpu = prev_cpu;
> -     struct perf_domain *head, *pd;
> -     unsigned long cpu_cap, util;
>       struct sched_domain *sd;
> +     struct perf_domain *pd;
>  
>       rcu_read_lock();
>       pd = rcu_dereference(rd->pd);
>       if (!pd || READ_ONCE(rd->overutilized))
>               goto fail;
> -     head = pd;
>  
>       /*
>        * Energy-aware wake-up happens on the lowest sched_domain starting
> @@ -6412,9 +6396,14 @@ static int find_energy_efficient_cpu(struct 
> task_struct *p, int prev_cpu)
>               goto unlock;
>  
>       for (; pd; pd = pd->next) {
> -             unsigned long cur_energy, spare_cap, max_spare_cap = 0;
> +             unsigned long cur_delta, spare_cap, max_spare_cap = 0;
> +             unsigned long base_energy_pd;
>               int max_spare_cap_cpu = -1;
>  
> +             /* Compute the 'base' energy of the pd, without @p */
> +             base_energy_pd = compute_energy(p, -1, pd);
> +             base_energy += base_energy_pd;
> +
>               for_each_cpu_and(cpu, perf_domain_span(pd), 
> sched_domain_span(sd)) {
>                       if (!cpumask_test_cpu(cpu, p->cpus_ptr))
>                               continue;
> @@ -6427,9 +6416,9 @@ static int find_energy_efficient_cpu(struct task_struct 
> *p, int prev_cpu)
>  
>                       /* Always use prev_cpu as a candidate. */
>                       if (cpu == prev_cpu) {
> -                             prev_energy = compute_energy(p, prev_cpu, head);
> -                             best_energy = min(best_energy, prev_energy);
> -                             continue;
> +                             prev_delta = compute_energy(p, prev_cpu, pd);
> +                             prev_delta -= base_energy_pd;
> +                             best_delta = min(best_delta, prev_delta);
>                       }

Earlier, we are not checking the spare capacity for the prev_cpu. Now that the
continue statement is removed, prev_cpu could also be the max_spare_cap_cpu.
Actually that makes sense. Because there is no reason why we want to select
another CPU which has less spare capacity than previous CPU.

Is this behavior intentional?

When prev_cpu == max_spare_cap_cpu, we are evaluating the energy again for the
same CPU below. That could have been skipped by returning prev_cpu when
prev_cpu == max_spare_cap_cpu.

>  
>                       /*
> @@ -6445,9 +6434,10 @@ static int find_energy_efficient_cpu(struct 
> task_struct *p, int prev_cpu)
>  
>               /* Evaluate the energy impact of using this CPU. */
>               if (max_spare_cap_cpu >= 0) {
> -                     cur_energy = compute_energy(p, max_spare_cap_cpu, head);
> -                     if (cur_energy < best_energy) {
> -                             best_energy = cur_energy;
> +                     cur_delta = compute_energy(p, max_spare_cap_cpu, pd);
> +                     cur_delta -= base_energy_pd;
> +                     if (cur_delta < best_delta) {
> +                             best_delta = cur_delta;
>                               best_energy_cpu = max_spare_cap_cpu;
>                       }
>               }
> @@ -6459,10 +6449,10 @@ static int find_energy_efficient_cpu(struct 
> task_struct *p, int prev_cpu)
>        * Pick the best CPU if prev_cpu cannot be used, or if it saves at
>        * least 6% of the energy used by prev_cpu.
>        */
> -     if (prev_energy == ULONG_MAX)
> +     if (prev_delta == ULONG_MAX)
>               return best_energy_cpu;
>  
> -     if ((prev_energy - best_energy) > (prev_energy >> 4))
> +     if ((prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
>               return best_energy_cpu;
>  
>       return prev_cpu;
> -- 
> 2.22.1
> 

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Reply via email to