Re: [PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
On Wed, Mar 16, 2016 at 9:53 AM, Peter Zijlstra wrote: > On Wed, Mar 16, 2016 at 09:29:59AM +0100, Vincent Guittot wrote: >> I wonder if it's really worth passing per sched_class request to >> sched_util ? sched_util is about selecting a frequency based on the >> utilization of the CPU, it only needs a value that reflect the whole >> utilization. Can't we sum (or whatever the formula we want to apply) >> utilizations before calling cpufreq_update_util > > So I've thought the same; but I'm conflicted, its a shame to compute > anything if the call then doesn't do anything with it. > > And keeping a structure of all the various numbers to pass in also has > cost of yet another cacheline to touch. In principle we can use high-order bits of util and max to encode the information on where they come from. Of course, that translates to additional ifs in the governor, but I guess they are unavoidable anyway.
Re: [PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
On Wed, Mar 16, 2016 at 01:39:10PM +0100, Rafael J. Wysocki wrote: > On Wed, Mar 16, 2016 at 9:53 AM, Peter Zijlstra wrote: > > On Wed, Mar 16, 2016 at 09:29:59AM +0100, Vincent Guittot wrote: > >> I wonder if it's really worth passing per sched_class request to > >> sched_util ? sched_util is about selecting a frequency based on the > >> utilization of the CPU, it only needs a value that reflect the whole > >> utilization. Can't we sum (or whatever the formula we want to apply) > >> utilizations before calling cpufreq_update_util > > > > So I've thought the same; but I'm conflicted, its a shame to compute > > anything if the call then doesn't do anything with it. > > > > And keeping a structure of all the various numbers to pass in also has > > cost of yet another cacheline to touch. > > In principle we can use high-order bits of util and max to encode the > information on where they come from. > > Of course, that translates to additional ifs in the governor, but I > guess they are unavoidable anyway. Another thing we can do, for as long as we have the indirect function call anyway, is stuff extra pointers in that same cacheline we pull the function from. Something like the below; there's room for 8 pointers (including the function pointer) in a cacheline. That would allow the callback to fetch whatever data it feels is required (could be all of it). We could also put a u64 *now = &rq->clock in, which would leave another 4 pointers for DL/RT support. And since we're then back to 1-2 arguments on the function, we can add a flags/mask field to indicate what changed (and if the function throttles, it can keep a mask of all that changed since last time it actually did something, or allow punching through the throttle if our minimum guarantee changes or whatnot). (this would of course require we allocate struct update_util_data with the proper alignment thingies etc..) Then again, maybe this is somewhat overboard :-) diff --git a/include/linux/sched.h b/include/linux/sched.h index ba49c9efd0b2..d34d75c5cc93 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -3236,8 +3236,10 @@ static inline unsigned long rlimit_max(unsigned int limit) #ifdef CONFIG_CPU_FREQ struct update_util_data { - void (*func)(struct update_util_data *data, -u64 time, unsigned long util, unsigned long max); + unsigned long *cfs_util_avg; + unsigned long *cfs_util_max; + + void (*func)(struct update_util_data *data, u64 time); }; void cpufreq_set_update_util_data(int cpu, struct update_util_data *data); diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c index 928c4ba32f68..de5b20b11de3 100644 --- a/kernel/sched/cpufreq.c +++ b/kernel/sched/cpufreq.c @@ -32,6 +32,9 @@ void cpufreq_set_update_util_data(int cpu, struct update_util_data *data) if (WARN_ON(data && !data->func)) return; + data->cfs_util_avg = &cpu_rq(cpu)->cfs.avg.util_avg; + data->cfs_util_max = &cpu_rq(cpu)->cpu_capacity_orig; + rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data); } EXPORT_SYMBOL_GPL(cpufreq_set_update_util_data);
Re: [PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
On Wed, Mar 16, 2016 at 02:23:21PM +0100, Rafael J. Wysocki wrote: > On Wed, Mar 16, 2016 at 2:10 PM, Peter Zijlstra wrote: > > (this would of course require we allocate struct update_util_data with > > the proper alignment thingies etc..) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index ba49c9efd0b2..d34d75c5cc93 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -3236,8 +3236,10 @@ static inline unsigned long rlimit_max(unsigned int > > limit) > > > > #ifdef CONFIG_CPU_FREQ > > struct update_util_data { > > - void (*func)(struct update_util_data *data, > > -u64 time, unsigned long util, unsigned long max); > > + unsigned long *cfs_util_avg; > > + unsigned long *cfs_util_max; > > + > > + void (*func)(struct update_util_data *data, u64 time); > > }; we should add: cacheline_aligned here > How do we ensure proper alignment? Depends on the allocator; not all of them respect the struct alignment attribute. kernel/sched/cpufreq.c:DEFINE_PER_CPU(struct update_util_data *, cpufreq_update_util_data); That one could use: DEFINE_PER_CPU_ALIGNED() instead as would this one: drivers/cpufreq/cpufreq_governor.c:static DEFINE_PER_CPU(struct cpu_dbs_info, cpu_dbs); Because when you cacheline align dbs_info, its update_util_data member will also get the correct alignment because of the structure attribute.
Re: [PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
On Wed, Mar 16, 2016 at 2:10 PM, Peter Zijlstra wrote: > On Wed, Mar 16, 2016 at 01:39:10PM +0100, Rafael J. Wysocki wrote: >> On Wed, Mar 16, 2016 at 9:53 AM, Peter Zijlstra wrote: >> > On Wed, Mar 16, 2016 at 09:29:59AM +0100, Vincent Guittot wrote: >> >> I wonder if it's really worth passing per sched_class request to >> >> sched_util ? sched_util is about selecting a frequency based on the >> >> utilization of the CPU, it only needs a value that reflect the whole >> >> utilization. Can't we sum (or whatever the formula we want to apply) >> >> utilizations before calling cpufreq_update_util >> > >> > So I've thought the same; but I'm conflicted, its a shame to compute >> > anything if the call then doesn't do anything with it. >> > >> > And keeping a structure of all the various numbers to pass in also has >> > cost of yet another cacheline to touch. >> >> In principle we can use high-order bits of util and max to encode the >> information on where they come from. >> >> Of course, that translates to additional ifs in the governor, but I >> guess they are unavoidable anyway. > > Another thing we can do, for as long as we have the indirect function > call anyway, is stuff extra pointers in that same cacheline we pull the > function from. > > Something like the below; there's room for 8 pointers (including the > function pointer) in a cacheline. > > That would allow the callback to fetch whatever data it feels is > required (could be all of it). > > We could also put a u64 *now = &rq->clock in, which would leave another > 4 pointers for DL/RT support. > > And since we're then back to 1-2 arguments on the function, we can add a > flags/mask field to indicate what changed (and if the function > throttles, it can keep a mask of all that changed since last time it > actually did something, or allow punching through the throttle if our > minimum guarantee changes or whatnot). > > (this would of course require we allocate struct update_util_data with > the proper alignment thingies etc..) > > Then again, maybe this is somewhat overboard :-) I was thinking about something along these lines, but then I thought that passing in registers would be more efficient. One advantage I can see here is that we don't pass arguments that may not be used by the callee. > diff --git a/include/linux/sched.h b/include/linux/sched.h > index ba49c9efd0b2..d34d75c5cc93 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -3236,8 +3236,10 @@ static inline unsigned long rlimit_max(unsigned int > limit) > > #ifdef CONFIG_CPU_FREQ > struct update_util_data { > - void (*func)(struct update_util_data *data, > -u64 time, unsigned long util, unsigned long max); > + unsigned long *cfs_util_avg; > + unsigned long *cfs_util_max; > + > + void (*func)(struct update_util_data *data, u64 time); > }; How do we ensure proper alignment? > void cpufreq_set_update_util_data(int cpu, struct update_util_data *data); > diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c > index 928c4ba32f68..de5b20b11de3 100644 > --- a/kernel/sched/cpufreq.c > +++ b/kernel/sched/cpufreq.c > @@ -32,6 +32,9 @@ void cpufreq_set_update_util_data(int cpu, struct > update_util_data *data) > if (WARN_ON(data && !data->func)) > return; > > + data->cfs_util_avg = &cpu_rq(cpu)->cfs.avg.util_avg; > + data->cfs_util_max = &cpu_rq(cpu)->cpu_capacity_orig; > + > rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data); > } > EXPORT_SYMBOL_GPL(cpufreq_set_update_util_data);
Re: [PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
On 16 March 2016 at 09:53, Peter Zijlstra wrote: > On Wed, Mar 16, 2016 at 09:29:59AM +0100, Vincent Guittot wrote: >> I wonder if it's really worth passing per sched_class request to >> sched_util ? sched_util is about selecting a frequency based on the >> utilization of the CPU, it only needs a value that reflect the whole >> utilization. Can't we sum (or whatever the formula we want to apply) >> utilizations before calling cpufreq_update_util > > So I've thought the same; but I'm conflicted, its a shame to compute > anything if the call then doesn't do anything with it. yes, at least we shoud skip all that stuff (including adding a margin) if no hook has been set in cpufreq_update_util. I also see potential optimization of updating the value only if the utilization has been decayed > > And keeping a structure of all the various numbers to pass in also has > cost of yet another cacheline to touch.
Re: [PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
On Wed, Mar 16, 2016 at 09:29:59AM +0100, Vincent Guittot wrote: > I wonder if it's really worth passing per sched_class request to > sched_util ? sched_util is about selecting a frequency based on the > utilization of the CPU, it only needs a value that reflect the whole > utilization. Can't we sum (or whatever the formula we want to apply) > utilizations before calling cpufreq_update_util So I've thought the same; but I'm conflicted, its a shame to compute anything if the call then doesn't do anything with it. And keeping a structure of all the various numbers to pass in also has cost of yet another cacheline to touch.
Re: [PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
On 16 March 2016 at 08:41, Peter Zijlstra wrote: > On Tue, Mar 15, 2016 at 03:06:09PM -0700, Michael Turquette wrote: >> Quoting Peter Zijlstra (2016-03-15 14:25:20) >> > On Sun, Mar 13, 2016 at 10:22:09PM -0700, Michael Turquette wrote: >> > > +++ b/include/linux/sched.h >> > > @@ -2362,15 +2362,25 @@ extern u64 scheduler_tick_max_deferment(void); >> > > static inline bool sched_can_stop_tick(void) { return false; } >> > > #endif >> > > >> > > +enum sched_class_util { >> > > + cfs_util, >> > > + rt_util, >> > > + dl_util, >> > > + nr_util_types, >> > > +}; >> > > + >> > > #ifdef CONFIG_CPU_FREQ >> > > struct freq_update_hook { >> > > + void (*func)(struct freq_update_hook *hook, >> > > + enum sched_class_util sched_class, u64 time, >> > >unsigned long util, unsigned long max); >> > > }; >> > > >> > Have you looked at the asm that generated? At some point you'll start >> > spilling on the stack and it'll be a god awful mess. >> > >> >> Is your complaint about the enum that I added to the existing function >> signature, or do you not like the original function signature[0] from >> Rafael's patch, sans enum? > > No, my complaint is more about the call ABI for all our platforms, at > some point we start passing arguments on the stack instead of through > registers. > > I'm not sure where that starts hurting, but its always a concern when > adding arguments to functions. I wonder if it's really worth passing per sched_class request to sched_util ? sched_util is about selecting a frequency based on the utilization of the CPU, it only needs a value that reflect the whole utilization. Can't we sum (or whatever the formula we want to apply) utilizations before calling cpufreq_update_util
Re: [PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
On Tue, Mar 15, 2016 at 03:06:09PM -0700, Michael Turquette wrote: > Quoting Peter Zijlstra (2016-03-15 14:25:20) > > On Sun, Mar 13, 2016 at 10:22:09PM -0700, Michael Turquette wrote: > > > +++ b/include/linux/sched.h > > > @@ -2362,15 +2362,25 @@ extern u64 scheduler_tick_max_deferment(void); > > > static inline bool sched_can_stop_tick(void) { return false; } > > > #endif > > > > > > +enum sched_class_util { > > > + cfs_util, > > > + rt_util, > > > + dl_util, > > > + nr_util_types, > > > +}; > > > + > > > #ifdef CONFIG_CPU_FREQ > > > struct freq_update_hook { > > > + void (*func)(struct freq_update_hook *hook, > > > + enum sched_class_util sched_class, u64 time, > > >unsigned long util, unsigned long max); > > > }; > > > > > Have you looked at the asm that generated? At some point you'll start > > spilling on the stack and it'll be a god awful mess. > > > > Is your complaint about the enum that I added to the existing function > signature, or do you not like the original function signature[0] from > Rafael's patch, sans enum? No, my complaint is more about the call ABI for all our platforms, at some point we start passing arguments on the stack instead of through registers. I'm not sure where that starts hurting, but its always a concern when adding arguments to functions.
Re: [PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
Hi Mike, On 03/15/2016 03:06 PM, Michael Turquette wrote: >>> > > void cpufreq_set_freq_update_hook(int cpu, struct freq_update_hook >>> > > *hook, >>> > > + void (*func)(struct freq_update_hook *hook, >>> > > + enum sched_class_util sched_class, >>> > > + u64 time, unsigned long util, >>> > > + unsigned long max)); >> > >> > Have you looked at the asm that generated? At some point you'll start >> > spilling on the stack and it'll be a god awful mess. >> > > Is your complaint about the enum that I added to the existing function > signature, or do you not like the original function signature[0] from > Rafael's patch, sans enum? The ARM procedure call standard has the first four arguments in registers so the addition of the enum above will start using the stack.
Re: [PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
On Sun, Mar 13, 2016 at 10:22:09PM -0700, Michael Turquette wrote: > +++ b/include/linux/sched.h > @@ -2362,15 +2362,25 @@ extern u64 scheduler_tick_max_deferment(void); > static inline bool sched_can_stop_tick(void) { return false; } > #endif > > +enum sched_class_util { > + cfs_util, > + rt_util, > + dl_util, > + nr_util_types, > +}; > + > #ifdef CONFIG_CPU_FREQ > struct freq_update_hook { > + void (*func)(struct freq_update_hook *hook, > + enum sched_class_util sched_class, u64 time, >unsigned long util, unsigned long max); > }; > > void cpufreq_set_freq_update_hook(int cpu, struct freq_update_hook *hook, > + void (*func)(struct freq_update_hook *hook, > + enum sched_class_util sched_class, > + u64 time, unsigned long util, > + unsigned long max)); Have you looked at the asm that generated? At some point you'll start spilling on the stack and it'll be a god awful mess.
[PATCH 5/8] sched/cpufreq: pass sched class into cpufreq_update_util
cpufreq_update_util() accepts a single utilization value which does not account for multiple utilization contributions from the cfs, rt & dl scheduler classes. Begin fixing this by adding a sched_class argument to cpufreq_update_util(), all of its call sites and the governor-specific hooks in intel_pstate.c, cpufreq_schedutil.c and cpufreq_governor.c. A follow-on patch will add summation of the sched_class contributions to the schedutil governor. Signed-off-by: Michael Turquette --- drivers/cpufreq/cpufreq_governor.c | 5 +++-- drivers/cpufreq/cpufreq_schedutil.c | 6 -- drivers/cpufreq/intel_pstate.c | 5 +++-- include/linux/sched.h | 16 +--- kernel/sched/cpufreq.c | 11 +++ kernel/sched/deadline.c | 2 +- kernel/sched/fair.c | 2 +- kernel/sched/rt.c | 2 +- kernel/sched/sched.h| 8 +--- 9 files changed, 38 insertions(+), 19 deletions(-) diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 148576c..4694751 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -248,8 +248,9 @@ static void dbs_irq_work(struct irq_work *irq_work) schedule_work(&policy_dbs->work); } -static void dbs_freq_update_handler(struct freq_update_hook *hook, u64 time, - unsigned long util_not_used, +static void dbs_freq_update_handler(struct freq_update_hook *hook, + enum sched_class_util sc_not_used, + u64 time, unsigned long util_not_used, unsigned long max_not_used) { struct cpu_dbs_info *cdbs = container_of(hook, struct cpu_dbs_info, update_hook); diff --git a/drivers/cpufreq/cpufreq_schedutil.c b/drivers/cpufreq/cpufreq_schedutil.c index 12e49b9..18d9ca3 100644 --- a/drivers/cpufreq/cpufreq_schedutil.c +++ b/drivers/cpufreq/cpufreq_schedutil.c @@ -106,7 +106,8 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, trace_cpu_frequency(freq, smp_processor_id()); } -static void sugov_update_single(struct freq_update_hook *hook, u64 time, +static void sugov_update_single(struct freq_update_hook *hook, + enum sched_class_util sc, u64 time, unsigned long util, unsigned long max) { struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_hook); @@ -166,7 +167,8 @@ static unsigned int sugov_next_freq(struct sugov_policy *sg_policy, return util * max_f / max; } -static void sugov_update_shared(struct freq_update_hook *hook, u64 time, +static void sugov_update_shared(struct freq_update_hook *hook, + enum sched_class_util sc, u64 time, unsigned long util, unsigned long max) { struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_hook); diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 20e2bb2..86aa368 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1020,8 +1020,9 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu) sample->freq); } -static void intel_pstate_freq_update(struct freq_update_hook *hook, u64 time, -unsigned long util_not_used, +static void intel_pstate_freq_update(struct freq_update_hook *hook, +enum sched_class_util sc_not_used +u64 time, unsigned long util_not_used, unsigned long max_not_used) { struct cpudata *cpu = container_of(hook, struct cpudata, update_hook); diff --git a/include/linux/sched.h b/include/linux/sched.h index f18a99b..1c7d7bd 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2362,15 +2362,25 @@ extern u64 scheduler_tick_max_deferment(void); static inline bool sched_can_stop_tick(void) { return false; } #endif +enum sched_class_util { + cfs_util, + rt_util, + dl_util, + nr_util_types, +}; + #ifdef CONFIG_CPU_FREQ struct freq_update_hook { - void (*func)(struct freq_update_hook *hook, u64 time, + void (*func)(struct freq_update_hook *hook, +enum sched_class_util sched_class, u64 time, unsigned long util, unsigned long max); }; void cpufreq_set_freq_update_hook(int cpu, struct freq_update_hook *hook, - void (*func)(struct freq_update_hook *hook, u64 time, -unsigned long util, unsigned long max)); + void (*func)(struct freq_update_hook *hook, +enum sched_class_util sched_class, +u64 time, unsigned long util, +