Re: [PATCH v6 11/16] sched/fair: Add uclamp support to energy_compute()
On Tuesday 22 Jan 2019 at 15:01:37 (+), Patrick Bellasi wrote: > > I'm not saying it's useful, I'm saying userspace can decide to do that > > if it thinks it is a good idea. The default should be min_cap = 1024 for > > RT, no questions. But you _can_ change it at runtime if you want to. > > That's my point. And doing that basically provides the same behaviour as > > what we have right now in terms of EAS calculation (but it changes the > > freq selection obviously) which is why I'm not fundamentally opposed to > > your patch. > > Well, I think it's tricky to say whether the current or new approach > is better... it probably depends on the use-case. Agreed. > > So in short, I'm fine with the behavioural change, but please at least > > mention it somewhere :-) > > Anyway... agree, it's just that to add some documentation I need to > get what you are pointing out ;) > > Will come up with some additional text to be added to the changelog Sounds good. > Maybe we can add a more detailed explanation of the different > behaviors you can get in the EAS documentation which is coming to > mainline ? Yeah, if you feel like it, I guess that won't hurt :-) Thanks, Quentin
Re: [PATCH v6 11/16] sched/fair: Add uclamp support to energy_compute()
On 22-Jan 14:39, Quentin Perret wrote: > On Tuesday 22 Jan 2019 at 14:26:06 (+), Patrick Bellasi wrote: > > On 22-Jan 13:29, Quentin Perret wrote: > > > On Tuesday 22 Jan 2019 at 12:45:46 (+), Patrick Bellasi wrote: > > > > On 22-Jan 12:13, Quentin Perret wrote: > > > > > On Tuesday 15 Jan 2019 at 10:15:08 (+), Patrick Bellasi wrote: > > > > > > The Energy Aware Scheduler (AES) estimates the energy impact of > > > > > > waking [...] > > > Ah, sorry, I guess my message was misleading. I'm saying this is > > > changing the way _EAS_ deals with RT tasks. Right now we don't actually > > > consider the RT-go-to-max thing at all in the EAS prediction. Your > > > patch is changing that AFAICT. It actually changes the way EAS sees RT > > > tasks even without uclamp ... > > > > Lemme see if I get it right. > > > > Currently, whenever we look at CPU utilization for ENERGY_UTIL, we > > always use cpu_util_rt() for RT tasks: > > > > ---8<--- > > util = util_cfs; > > util += cpu_util_rt(rq); > > util += dl_util; > > ---8<--- > > > > Thus, even when RT tasks are RUNNABLE, we don't always assume the CPU > > running at the max capacity but just whatever is the aggregated > > utilization across all the classes. > > > > With uclamp, we have: > > > > ---8<--- > > util = cpu_util_rt(rq) + util_cfs; > > if (type == FREQUENCY_UTIL) > > util = uclamp_util_with(rq, util, p); > > dl_util = cpu_util_dl(rq); > > if (type == ENERGY_UTIL) > > util += dl_util; > > ---8<--- > > > > So, I would say that, in terms of ENERGY_UTIL we do the same both > > w/ and w/o uclamp. Isn't it? > > Yes but now you use FREQUENCY_UTIL for computing 'max_util' in the EAS > prediction. Right, I overlook that "little" detail... :/ > Let's take an example. You have a perf domain with two CPUs. One CPU is > busy running a RT task, the other CPU runs a CFS task. Right now in > compute_energy() we only use ENERGY_UTIL, so 'max_util' ends up being > the max between the utilization of the two tasks. So we don't predict > we're going to max freq. +1 > With your patch, we use FREQUENCY_UTIL to compute 'max_util', so we > _will_ predict that we're going to max freq. Right, with the default conf yes. > And we will do that even if CONFIG_UCLAMP_TASK=n. While this should not happen, as I wrote in the RT integration patch, that's happening because I'm missing some compilation guard or similar. In this configurations we should always go to max... will look into that. > The default EAS calculation will be different with your patch when there > are runnable RT tasks in the system. This needs to be documented, I > think. Sure... > > > But I'm not hostile to the idea since it's possible to enable uclamp and > > > set the min cap to 0 for RT if you want. So there is a story there. > > > However, I think this needs be documented somewhere, at the very least. > > > > The only difference I see is that the actual frequency could be > > different (lower then max) when a clamped RT task is RUNNABLE. > > > > Are you worried that running RT on a lower freq could have side > > effects on the estimated busy time the CPU ? > > > > I also still don't completely get why you say it could be useful to > >"set the min cap to 0 for RT if you want" > > I'm not saying it's useful, I'm saying userspace can decide to do that > if it thinks it is a good idea. The default should be min_cap = 1024 for > RT, no questions. But you _can_ change it at runtime if you want to. > That's my point. And doing that basically provides the same behaviour as > what we have right now in terms of EAS calculation (but it changes the > freq selection obviously) which is why I'm not fundamentally opposed to > your patch. Well, I think it's tricky to say whether the current or new approach is better... it probably depends on the use-case. > So in short, I'm fine with the behavioural change, but please at least > mention it somewhere :-) Anyway... agree, it's just that to add some documentation I need to get what you are pointing out ;) Will come up with some additional text to be added to the changelog Maybe we can add a more detailed explanation of the different behaviors you can get in the EAS documentation which is coming to mainline ? > Thanks, > Quentin -- #include Patrick Bellasi
Re: [PATCH v6 11/16] sched/fair: Add uclamp support to energy_compute()
On Tuesday 22 Jan 2019 at 14:26:06 (+), Patrick Bellasi wrote: > On 22-Jan 13:29, Quentin Perret wrote: > > On Tuesday 22 Jan 2019 at 12:45:46 (+), Patrick Bellasi wrote: > > > On 22-Jan 12:13, Quentin Perret wrote: > > > > On Tuesday 15 Jan 2019 at 10:15:08 (+), Patrick Bellasi wrote: > > > > > The Energy Aware Scheduler (AES) estimates the energy impact of waking > > > > > > [...] > > > > > > > > + for_each_cpu_and(cpu, pd_mask, cpu_online_mask) { > > > > > + cfs_util = cpu_util_next(cpu, p, dst_cpu); > > > > > + > > > > > + /* > > > > > + * Busy time computation: utilization clamping > > > > > is not > > > > > + * required since the ratio (sum_util / > > > > > cpu_capacity) > > > > > + * is already enough to scale the EM reported > > > > > power > > > > > + * consumption at the (eventually clamped) > > > > > cpu_capacity. > > > > > + */ > > > > > > > > Right. > > > > > > > > > + sum_util += schedutil_cpu_util(cpu, cfs_util, > > > > > cpu_cap, > > > > > +ENERGY_UTIL, > > > > > NULL); > > > > > + > > > > > + /* > > > > > + * Performance domain frequency: utilization > > > > > clamping > > > > > + * must be considered since it affects the > > > > > selection > > > > > + * of the performance domain frequency. > > > > > + */ > > > > > > > > So that actually affects the way we deal with RT I think. I assume the > > > > idea is to say if you don't want to reflect the RT-go-to-max-freq thing > > > > in EAS (which is what we do now) you should set the min cap for RT to 0. > > > > Is that correct ? > > > > > > By default configuration, RT tasks still go to max when uclamp is > > > enabled, since they get a util_min=1024. > > > > > > If we want to save power on RT tasks, we can set a smaller util_min... > > > but not necessarily 0. A util_min=0 for RT tasks means to use just > > > cpu_util_rt() for that class. > > > > Ah, sorry, I guess my message was misleading. I'm saying this is > > changing the way _EAS_ deals with RT tasks. Right now we don't actually > > consider the RT-go-to-max thing at all in the EAS prediction. Your > > patch is changing that AFAICT. It actually changes the way EAS sees RT > > tasks even without uclamp ... > > Lemme see if I get it right. > > Currently, whenever we look at CPU utilization for ENERGY_UTIL, we > always use cpu_util_rt() for RT tasks: > > ---8<--- > util = util_cfs; > util += cpu_util_rt(rq); > util += dl_util; > ---8<--- > > Thus, even when RT tasks are RUNNABLE, we don't always assume the CPU > running at the max capacity but just whatever is the aggregated > utilization across all the classes. > > With uclamp, we have: > > ---8<--- > util = cpu_util_rt(rq) + util_cfs; > if (type == FREQUENCY_UTIL) > util = uclamp_util_with(rq, util, p); > dl_util = cpu_util_dl(rq); > if (type == ENERGY_UTIL) > util += dl_util; > ---8<--- > > So, I would say that, in terms of ENERGY_UTIL we do the same both > w/ and w/o uclamp. Isn't it? Yes but now you use FREQUENCY_UTIL for computing 'max_util' in the EAS prediction. Let's take an example. You have a perf domain with two CPUs. One CPU is busy running a RT task, the other CPU runs a CFS task. Right now in compute_energy() we only use ENERGY_UTIL, so 'max_util' ends up being the max between the utilization of the two tasks. So we don't predict we're going to max freq. With your patch, we use FREQUENCY_UTIL to compute 'max_util', so we _will_ predict that we're going to max freq. And we will do that even if CONFIG_UCLAMP_TASK=n. The default EAS calculation will be different with your patch when there are runnable RT tasks in the system. This needs to be documented, I think. > > But I'm not hostile to the idea since it's possible to enable uclamp and > > set the min cap to 0 for RT if you want. So there is a story there. > > However, I think this needs be documented somewhere, at the very least. > > The only difference I see is that the actual frequency could be > different (lower then max) when a clamped RT task is RUNNABLE. > > Are you worried that running RT on a lower freq could have side > effects on the estimated busy time the CPU ? > > I also still don't completely get why you say it could be useful to >"set the min cap to 0 for RT if you want" I'm not saying it's useful, I'm saying userspace can decide to do that if it thinks it is a good idea. The default should be min_cap = 1024 for RT, no questions. But you _can_ change it at runtime if you want to. That's my point. And doing that basically provides the same behaviour as what we have right now in terms of EAS calculation (but i
Re: [PATCH v6 11/16] sched/fair: Add uclamp support to energy_compute()
On 22-Jan 13:29, Quentin Perret wrote: > On Tuesday 22 Jan 2019 at 12:45:46 (+), Patrick Bellasi wrote: > > On 22-Jan 12:13, Quentin Perret wrote: > > > On Tuesday 15 Jan 2019 at 10:15:08 (+), Patrick Bellasi wrote: > > > > The Energy Aware Scheduler (AES) estimates the energy impact of waking > > > > [...] > > > > > > + for_each_cpu_and(cpu, pd_mask, cpu_online_mask) { > > > > + cfs_util = cpu_util_next(cpu, p, dst_cpu); > > > > + > > > > + /* > > > > +* Busy time computation: utilization clamping > > > > is not > > > > +* required since the ratio (sum_util / > > > > cpu_capacity) > > > > +* is already enough to scale the EM reported > > > > power > > > > +* consumption at the (eventually clamped) > > > > cpu_capacity. > > > > +*/ > > > > > > Right. > > > > > > > + sum_util += schedutil_cpu_util(cpu, cfs_util, > > > > cpu_cap, > > > > + ENERGY_UTIL, > > > > NULL); > > > > + > > > > + /* > > > > +* Performance domain frequency: utilization > > > > clamping > > > > +* must be considered since it affects the > > > > selection > > > > +* of the performance domain frequency. > > > > +*/ > > > > > > So that actually affects the way we deal with RT I think. I assume the > > > idea is to say if you don't want to reflect the RT-go-to-max-freq thing > > > in EAS (which is what we do now) you should set the min cap for RT to 0. > > > Is that correct ? > > > > By default configuration, RT tasks still go to max when uclamp is > > enabled, since they get a util_min=1024. > > > > If we want to save power on RT tasks, we can set a smaller util_min... > > but not necessarily 0. A util_min=0 for RT tasks means to use just > > cpu_util_rt() for that class. > > Ah, sorry, I guess my message was misleading. I'm saying this is > changing the way _EAS_ deals with RT tasks. Right now we don't actually > consider the RT-go-to-max thing at all in the EAS prediction. Your > patch is changing that AFAICT. It actually changes the way EAS sees RT > tasks even without uclamp ... Lemme see if I get it right. Currently, whenever we look at CPU utilization for ENERGY_UTIL, we always use cpu_util_rt() for RT tasks: ---8<--- util = util_cfs; util += cpu_util_rt(rq); util += dl_util; ---8<--- Thus, even when RT tasks are RUNNABLE, we don't always assume the CPU running at the max capacity but just whatever is the aggregated utilization across all the classes. With uclamp, we have: ---8<--- util = cpu_util_rt(rq) + util_cfs; if (type == FREQUENCY_UTIL) util = uclamp_util_with(rq, util, p); dl_util = cpu_util_dl(rq); if (type == ENERGY_UTIL) util += dl_util; ---8<--- So, I would say that, in terms of ENERGY_UTIL we do the same both w/ and w/o uclamp. Isn't it? > But I'm not hostile to the idea since it's possible to enable uclamp and > set the min cap to 0 for RT if you want. So there is a story there. > However, I think this needs be documented somewhere, at the very least. The only difference I see is that the actual frequency could be different (lower then max) when a clamped RT task is RUNNABLE. Are you worried that running RT on a lower freq could have side effects on the estimated busy time the CPU ? I also still don't completely get why you say it could be useful to "set the min cap to 0 for RT if you want" IMO this will be an even bigger difference wrt mainline, since the RT tasks will never have a granted minimum freq but just whatever utilization we measure for them. -- #include Patrick Bellasi
Re: [PATCH v6 11/16] sched/fair: Add uclamp support to energy_compute()
On Tuesday 22 Jan 2019 at 12:45:46 (+), Patrick Bellasi wrote: > On 22-Jan 12:13, Quentin Perret wrote: > > On Tuesday 15 Jan 2019 at 10:15:08 (+), Patrick Bellasi wrote: > > > The Energy Aware Scheduler (AES) estimates the energy impact of waking > > [...] > > > > + for_each_cpu_and(cpu, pd_mask, cpu_online_mask) { > > > + cfs_util = cpu_util_next(cpu, p, dst_cpu); > > > + > > > + /* > > > + * Busy time computation: utilization clamping is not > > > + * required since the ratio (sum_util / cpu_capacity) > > > + * is already enough to scale the EM reported power > > > + * consumption at the (eventually clamped) cpu_capacity. > > > + */ > > > > Right. > > > > > + sum_util += schedutil_cpu_util(cpu, cfs_util, cpu_cap, > > > +ENERGY_UTIL, NULL); > > > + > > > + /* > > > + * Performance domain frequency: utilization clamping > > > + * must be considered since it affects the selection > > > + * of the performance domain frequency. > > > + */ > > > > So that actually affects the way we deal with RT I think. I assume the > > idea is to say if you don't want to reflect the RT-go-to-max-freq thing > > in EAS (which is what we do now) you should set the min cap for RT to 0. > > Is that correct ? > > By default configuration, RT tasks still go to max when uclamp is > enabled, since they get a util_min=1024. > > If we want to save power on RT tasks, we can set a smaller util_min... > but not necessarily 0. A util_min=0 for RT tasks means to use just > cpu_util_rt() for that class. Ah, sorry, I guess my message was misleading. I'm saying this is changing the way _EAS_ deals with RT tasks. Right now we don't actually consider the RT-go-to-max thing at all in the EAS prediction. Your patch is changing that AFAICT. It actually changes the way EAS sees RT tasks even without uclamp ... But I'm not hostile to the idea since it's possible to enable uclamp and set the min cap to 0 for RT if you want. So there is a story there. However, I think this needs be documented somewhere, at the very least. Thanks, Quentin
Re: [PATCH v6 11/16] sched/fair: Add uclamp support to energy_compute()
On 22-Jan 12:13, Quentin Perret wrote: > On Tuesday 15 Jan 2019 at 10:15:08 (+), Patrick Bellasi wrote: > > The Energy Aware Scheduler (AES) estimates the energy impact of waking [...] > > + for_each_cpu_and(cpu, pd_mask, cpu_online_mask) { > > + cfs_util = cpu_util_next(cpu, p, dst_cpu); > > + > > + /* > > +* Busy time computation: utilization clamping is not > > +* required since the ratio (sum_util / cpu_capacity) > > +* is already enough to scale the EM reported power > > +* consumption at the (eventually clamped) cpu_capacity. > > +*/ > > Right. > > > + sum_util += schedutil_cpu_util(cpu, cfs_util, cpu_cap, > > + ENERGY_UTIL, NULL); > > + > > + /* > > +* Performance domain frequency: utilization clamping > > +* must be considered since it affects the selection > > +* of the performance domain frequency. > > +*/ > > So that actually affects the way we deal with RT I think. I assume the > idea is to say if you don't want to reflect the RT-go-to-max-freq thing > in EAS (which is what we do now) you should set the min cap for RT to 0. > Is that correct ? By default configuration, RT tasks still go to max when uclamp is enabled, since they get a util_min=1024. If we want to save power on RT tasks, we can set a smaller util_min... but not necessarily 0. A util_min=0 for RT tasks means to use just cpu_util_rt() for that class. > I'm fine with this conceptually but maybe the specific case of RT should > be mentioned somewhere in the commit message or so ? I think it's > important to say that clearly since this patch changes the default > behaviour. Default behavior for RT should not be affected. While a capping is possible for those tasks... where do you see issues ? Here we are just figuring out what's the capacity the task will run at, if we will have clamped RT tasks will not be the max but: is that a problem ? > > + cpu_util = schedutil_cpu_util(cpu, cfs_util, cpu_cap, > > + FREQUENCY_UTIL, > > + cpu == dst_cpu ? p : > > NULL); > > + max_util = max(max_util, cpu_util); > > } > > > > energy += em_pd_energy(pd->em_pd, max_util, sum_util); > > Thanks, > Quentin -- #include Patrick Bellasi
Re: [PATCH v6 11/16] sched/fair: Add uclamp support to energy_compute()
On Tuesday 15 Jan 2019 at 10:15:08 (+), Patrick Bellasi wrote: > The Energy Aware Scheduler (AES) estimates the energy impact of waking s/AES/EAS :-) [...] > + for_each_cpu_and(cpu, pd_mask, cpu_online_mask) { > + cfs_util = cpu_util_next(cpu, p, dst_cpu); > + > + /* > + * Busy time computation: utilization clamping is not > + * required since the ratio (sum_util / cpu_capacity) > + * is already enough to scale the EM reported power > + * consumption at the (eventually clamped) cpu_capacity. > + */ Right. > + sum_util += schedutil_cpu_util(cpu, cfs_util, cpu_cap, > +ENERGY_UTIL, NULL); > + > + /* > + * Performance domain frequency: utilization clamping > + * must be considered since it affects the selection > + * of the performance domain frequency. > + */ So that actually affects the way we deal with RT I think. I assume the idea is to say if you don't want to reflect the RT-go-to-max-freq thing in EAS (which is what we do now) you should set the min cap for RT to 0. Is that correct ? I'm fine with this conceptually but maybe the specific case of RT should be mentioned somewhere in the commit message or so ? I think it's important to say that clearly since this patch changes the default behaviour. > + cpu_util = schedutil_cpu_util(cpu, cfs_util, cpu_cap, > + FREQUENCY_UTIL, > + cpu == dst_cpu ? p : > NULL); > + max_util = max(max_util, cpu_util); > } > > energy += em_pd_energy(pd->em_pd, max_util, sum_util); Thanks, Quentin
[PATCH v6 11/16] sched/fair: Add uclamp support to energy_compute()
The Energy Aware Scheduler (AES) estimates the energy impact of waking up a task on a given CPU. This estimation is based on: a) an (active) power consumptions defined for each CPU frequency b) an estimation of which frequency will be used on each CPU c) an estimation of the busy time (utilization) of each CPU Utilization clamping can affect both b) and c) estimations. A CPU is expected to run: - on an higher than required frequency, but for a shorter time, in case its estimated utilization will be smaller then the minimum utilization enforced by uclamp - on a smaller than required frequency, but for a longer time, in case its estimated utilization is bigger then the maximum utilization enforced by uclamp While effects on busy time for both boosted/capped tasks are already considered by compute_energy(), clamping effects on frequency selection are currently ignored by that function. Fix it by considering how CPU clamp values will be affected by a task waking up and being RUNNABLE on that CPU. Do that by refactoring schedutil_freq_util() to take an additional task_struct* which allows EAS to evaluate the impact on clamp values of a task being eventually queued in a CPU. Clamp values are applied to the RT+CFS utilization only when a FREQUENCY_UTIL is required by compute_energy(). Since we are at that: - rename schedutil_freq_util() into schedutil_cpu_util(), since it's not only used for frequency selection. - use "unsigned int" instead of "unsigned long" whenever the tracked utilization value is not expected to overflow 32bit. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rafael J. Wysocki --- kernel/sched/cpufreq_schedutil.c | 26 +++--- kernel/sched/fair.c | 37 ++-- kernel/sched/sched.h | 19 +--- 3 files changed, 48 insertions(+), 34 deletions(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 38a05a4f78cc..4e02b419c482 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -195,10 +195,11 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, * based on the task model parameters and gives the minimal utilization * required to meet deadlines. */ -unsigned long schedutil_freq_util(int cpu, unsigned long util_cfs, - unsigned long max, enum schedutil_type type) +unsigned int schedutil_cpu_util(int cpu, unsigned int util_cfs, +unsigned int max, enum schedutil_type type, +struct task_struct *p) { - unsigned long dl_util, util, irq; + unsigned int dl_util, util, irq; struct rq *rq = cpu_rq(cpu); /* @@ -222,13 +223,9 @@ unsigned long schedutil_freq_util(int cpu, unsigned long util_cfs, * When there are no CFS RUNNABLE tasks, clamps are released and * frequency will be gracefully reduced with the utilization decay. */ - util = cpu_util_rt(rq); - if (type == FREQUENCY_UTIL) { - util += cpu_util_cfs(rq); - util = uclamp_util(rq, util); - } else { - util += util_cfs; - } + util = cpu_util_rt(rq) + util_cfs; + if (type == FREQUENCY_UTIL) + util = uclamp_util_with(rq, util, p); dl_util = cpu_util_dl(rq); @@ -282,13 +279,14 @@ unsigned long schedutil_freq_util(int cpu, unsigned long util_cfs, static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu) { struct rq *rq = cpu_rq(sg_cpu->cpu); - unsigned long util = cpu_util_cfs(rq); - unsigned long max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu); + unsigned int util_cfs = cpu_util_cfs(rq); + unsigned int cpu_cap = arch_scale_cpu_capacity(NULL, sg_cpu->cpu); - sg_cpu->max = max; + sg_cpu->max = cpu_cap; sg_cpu->bw_dl = cpu_bw_dl(rq); - return schedutil_freq_util(sg_cpu->cpu, util, max, FREQUENCY_UTIL); + return schedutil_cpu_util(sg_cpu->cpu, util_cfs, cpu_cap, + FREQUENCY_UTIL, NULL); } /** diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5de061b055d2..7f8ca3b02dec 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6424,11 +6424,20 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu) static long compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd) { - long util, max_util, sum_util, energy = 0; + unsigned int max_util, cfs_util, cpu_util, cpu_cap; + unsigned long sum_util, energy = 0; int cpu; for (; pd; pd = pd->next) { + struct cpumask *pd_mask = perf_domain_span(pd); + + /* +* The energy model mandate all the CPUs of a performance +* domain have the same capacity. +*/ + cpu_cap