Hi Quentin, On Thu, Sep 12, 2019 at 11:44:04AM +0200, Quentin Perret wrote: > From: Quentin Perret <quentin.per...@arm.com> > > EAS computes the energy impact of migrating a waking task when deciding > on which CPU it should run. However, the current approach is known to > have a high algorithmic complexity, which can result in prohibitively > high wake-up latencies on systems with complex energy models, such as > systems with per-CPU DVFS. On such systems, the algorithm complexity is > in O(n^2) (ignoring the cost of searching for performance states in the > EM) with 'n' the number of CPUs. > > To address this, re-factor the EAS wake-up path to compute the energy > 'delta' (with and without the task) on a per-performance domain basis, > rather than system-wide, which brings the complexity down to O(n). > > No functional changes intended. > > Signed-off-by: Quentin Perret <quentin.per...@arm.com> >
[snip] > /* > @@ -6381,21 +6367,19 @@ compute_energy(struct task_struct *p, int dst_cpu, > struct perf_domain *pd) > * other use-cases too. So, until someone finds a better way to solve this, > * let's keep things simple by re-using the existing slow path. > */ > - > static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) > { > - unsigned long prev_energy = ULONG_MAX, best_energy = ULONG_MAX; > + unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX; > struct root_domain *rd = cpu_rq(smp_processor_id())->rd; > + unsigned long cpu_cap, util, base_energy = 0; > int cpu, best_energy_cpu = prev_cpu; > - struct perf_domain *head, *pd; > - unsigned long cpu_cap, util; > struct sched_domain *sd; > + struct perf_domain *pd; > > rcu_read_lock(); > pd = rcu_dereference(rd->pd); > if (!pd || READ_ONCE(rd->overutilized)) > goto fail; > - head = pd; > > /* > * Energy-aware wake-up happens on the lowest sched_domain starting > @@ -6412,9 +6396,14 @@ static int find_energy_efficient_cpu(struct > task_struct *p, int prev_cpu) > goto unlock; > > for (; pd; pd = pd->next) { > - unsigned long cur_energy, spare_cap, max_spare_cap = 0; > + unsigned long cur_delta, spare_cap, max_spare_cap = 0; > + unsigned long base_energy_pd; > int max_spare_cap_cpu = -1; > > + /* Compute the 'base' energy of the pd, without @p */ > + base_energy_pd = compute_energy(p, -1, pd); > + base_energy += base_energy_pd; > + > for_each_cpu_and(cpu, perf_domain_span(pd), > sched_domain_span(sd)) { > if (!cpumask_test_cpu(cpu, p->cpus_ptr)) > continue; > @@ -6427,9 +6416,9 @@ static int find_energy_efficient_cpu(struct task_struct > *p, int prev_cpu) > > /* Always use prev_cpu as a candidate. */ > if (cpu == prev_cpu) { > - prev_energy = compute_energy(p, prev_cpu, head); > - best_energy = min(best_energy, prev_energy); > - continue; > + prev_delta = compute_energy(p, prev_cpu, pd); > + prev_delta -= base_energy_pd; > + best_delta = min(best_delta, prev_delta); > } Earlier, we are not checking the spare capacity for the prev_cpu. Now that the continue statement is removed, prev_cpu could also be the max_spare_cap_cpu. Actually that makes sense. Because there is no reason why we want to select another CPU which has less spare capacity than previous CPU. Is this behavior intentional? When prev_cpu == max_spare_cap_cpu, we are evaluating the energy again for the same CPU below. That could have been skipped by returning prev_cpu when prev_cpu == max_spare_cap_cpu. > > /* > @@ -6445,9 +6434,10 @@ static int find_energy_efficient_cpu(struct > task_struct *p, int prev_cpu) > > /* Evaluate the energy impact of using this CPU. */ > if (max_spare_cap_cpu >= 0) { > - cur_energy = compute_energy(p, max_spare_cap_cpu, head); > - if (cur_energy < best_energy) { > - best_energy = cur_energy; > + cur_delta = compute_energy(p, max_spare_cap_cpu, pd); > + cur_delta -= base_energy_pd; > + if (cur_delta < best_delta) { > + best_delta = cur_delta; > best_energy_cpu = max_spare_cap_cpu; > } > } > @@ -6459,10 +6449,10 @@ static int find_energy_efficient_cpu(struct > task_struct *p, int prev_cpu) > * Pick the best CPU if prev_cpu cannot be used, or if it saves at > * least 6% of the energy used by prev_cpu. > */ > - if (prev_energy == ULONG_MAX) > + if (prev_delta == ULONG_MAX) > return best_energy_cpu; > > - if ((prev_energy - best_energy) > (prev_energy >> 4)) > + if ((prev_delta - best_delta) > ((prev_delta + base_energy) >> 4)) > return best_energy_cpu; > > return prev_cpu; > -- > 2.22.1 > Thanks, Pavan -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.