Re: [PATCH 3/8] sched: rt-group: interface
On Sat, Feb 23, 2008 at 12:26 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > In that case I guess I'll have to add signed versions of the > > read_uint/write_uint methods. > > Yes, I looked at that, I found the interface somewhat unfortunate, it > would mean growing the struct with two more function pointers. Is that really a big deal? We're talking about a structure that has a small number (<10 in the current tree) of instances per cgroup subsystem. > Perhaps a > read and write function with abstract data would be better suited. That > would allow for this and more. Sadly it looses type information. If the size of the struct cftype really became a problem, I think the cleanest way to fix it would be to have a union of the potential function pointers, and add a field to specify which one is in use. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Sat, 2008-02-23 at 12:02 -0800, Paul Menage wrote: > On Sat, Feb 23, 2008 at 11:57 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > If so, could we avoid that problem by using 0 rather than -1 as the > > > "unlimited" value? It looks from what I've read in the Documentation > > > changes as though 0 isn't really a meaningful value. > > > > 0 means no time, quite useful and clearly distinct from inf. time. > > > > So a real-time task in a cgroup with a 0 rt_runtime can be in the R > state but never actually get to run? OK, if people need to be able to > do that then fair enough. Yeah, its an awkward situation, and we refuse new rt tasks in such groups. But the 0 value is needed so you can have groups that don't participate in the realtime scheduling because we enforce a schedulalbility constraint over the groups. Each group has a runtime ratio, namely: rt_runtime / rt_period. The sum of this ratio over all groups must be smaller or equal to the global ratio which must be smaller or equal to 1. > In that case I guess I'll have to add signed versions of the > read_uint/write_uint methods. Yes, I looked at that, I found the interface somewhat unfortunate, it would mean growing the struct with two more function pointers. Perhaps a read and write function with abstract data would be better suited. That would allow for this and more. Sadly it looses type information. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Sat, Feb 23, 2008 at 11:57 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > If so, could we avoid that problem by using 0 rather than -1 as the > > "unlimited" value? It looks from what I've read in the Documentation > > changes as though 0 isn't really a meaningful value. > > 0 means no time, quite useful and clearly distinct from inf. time. > So a real-time task in a cgroup with a 0 rt_runtime can be in the R state but never actually get to run? OK, if people need to be able to do that then fair enough. In that case I guess I'll have to add signed versions of the read_uint/write_uint methods. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Sat, 2008-02-23 at 11:48 -0800, Paul Menage wrote: > On Mon, Feb 4, 2008 at 1:03 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > +static int cpu_rt_runtime_write(struct cgroup *cgrp, struct cftype *cft, > > + struct file *file, > > + const char __user *userbuf, > > + size_t nbytes, loff_t *unused_ppos) > > +{ > > + char buffer[64]; > > + int retval = 0; > > + s64 val; > > + char *end; > > + > > + if (!nbytes) > > + return -EINVAL; > > + if (nbytes >= sizeof(buffer)) > > + return -E2BIG; > > + if (copy_from_user(buffer, userbuf, nbytes)) > > + return -EFAULT; > > + > > + buffer[nbytes] = 0; /* nul-terminate */ > > + > > + /* strip newline if necessary */ > > + if (nbytes && (buffer[nbytes-1] == '\n')) > > + buffer[nbytes-1] = 0; > > + val = simple_strtoll(buffer, &end, 0); > > + if (*end) > > + return -EINVAL; > > + > > + /* Pass to subsystem */ > > + retval = sched_group_set_rt_runtime(cgroup_tg(cgrp), val); > > + if (!retval) > > + retval = nbytes; > > + return retval; > > } > > > > -static u64 cpu_rt_ratio_read_uint(struct cgroup *cgrp, struct cftype *cft) > > -{ > > - struct task_group *tg = cgroup_tg(cgrp); > > +static ssize_t cpu_rt_runtime_read(struct cgroup *cgrp, struct cftype > > *cft, > > + struct file *file, > > + char __user *buf, size_t nbytes, > > + loff_t *ppos) > > +{ > > + char tmp[64]; > > + long val = sched_group_rt_runtime(cgroup_tg(cgrp)); > > + int len = sprintf(tmp, "%ld\n", val); > > > > - return (u64) tg->rt_ratio; > > + return simple_read_from_buffer(buf, nbytes, ppos, tmp, len); > > } > > What's the reason that you can't use the cgroup read_uint/write_uint > methods for this? Is it just because you have -1 as your "unlimited" > value. Yes. > If so, could we avoid that problem by using 0 rather than -1 as the > "unlimited" value? It looks from what I've read in the Documentation > changes as though 0 isn't really a meaningful value. 0 means no time, quite useful and clearly distinct from inf. time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Mon, Feb 4, 2008 at 1:03 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > +static int cpu_rt_runtime_write(struct cgroup *cgrp, struct cftype *cft, > + struct file *file, > + const char __user *userbuf, > + size_t nbytes, loff_t *unused_ppos) > +{ > + char buffer[64]; > + int retval = 0; > + s64 val; > + char *end; > + > + if (!nbytes) > + return -EINVAL; > + if (nbytes >= sizeof(buffer)) > + return -E2BIG; > + if (copy_from_user(buffer, userbuf, nbytes)) > + return -EFAULT; > + > + buffer[nbytes] = 0; /* nul-terminate */ > + > + /* strip newline if necessary */ > + if (nbytes && (buffer[nbytes-1] == '\n')) > + buffer[nbytes-1] = 0; > + val = simple_strtoll(buffer, &end, 0); > + if (*end) > + return -EINVAL; > + > + /* Pass to subsystem */ > + retval = sched_group_set_rt_runtime(cgroup_tg(cgrp), val); > + if (!retval) > + retval = nbytes; > + return retval; > } > > -static u64 cpu_rt_ratio_read_uint(struct cgroup *cgrp, struct cftype *cft) > -{ > - struct task_group *tg = cgroup_tg(cgrp); > +static ssize_t cpu_rt_runtime_read(struct cgroup *cgrp, struct cftype *cft, > + struct file *file, > + char __user *buf, size_t nbytes, > + loff_t *ppos) > +{ > + char tmp[64]; > + long val = sched_group_rt_runtime(cgroup_tg(cgrp)); > + int len = sprintf(tmp, "%ld\n", val); > > - return (u64) tg->rt_ratio; > + return simple_read_from_buffer(buf, nbytes, ppos, tmp, len); > } What's the reason that you can't use the cgroup read_uint/write_uint methods for this? Is it just because you have -1 as your "unlimited" value. If so, could we avoid that problem by using 0 rather than -1 as the "unlimited" value? It looks from what I've read in the Documentation changes as though 0 isn't really a meaningful value. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: rt-group: interface
On Mon, 04 Feb 2008 22:03:01 +0100 Peter Zijlstra wrote: > Change the rt_ratio interface to rt_runtime_us, to match rt_period_us. > This avoids picking a granularity for the ratio. > > Extend the /sys/kernel/uids// interface to allow setting > the group's rt_runtime. > > Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> > --- > Documentation/ABI/testing/sysfs-kernel-uids |6 + > Documentation/sched-rt-group.txt| 59 +++ > include/linux/sched.h |7 - > kernel/sched.c | 145 > +--- > kernel/sched_rt.c | 53 -- > kernel/sysctl.c | 32 +++--- > kernel/user.c | 28 + > 7 files changed, 250 insertions(+), 80 deletions(-) > Index: linux-2.6/kernel/sched.c > === > --- linux-2.6.orig/kernel/sched.c > +++ linux-2.6/kernel/sched.c > @@ -7780,30 +7783,76 @@ unsigned long sched_group_shares(struct > } > > /* > - * Ensure the total rt_ratio <= sysctl_sched_rt_ratio > + * Ensure that the real time constraints are schedulable. > */ > -int sched_group_set_rt_ratio(struct task_group *tg, unsigned long rt_ratio) > +static DEFINE_MUTEX(rt_constraints_mutex); > + > +static unsigned long to_ratio(u64 period, u64 runtime) > +{ > + if (runtime == RUNTIME_INF) > + return 1ULL << 16; > + > + runtime *= (1ULL << 16); > + do_div(runtime, period); Isn't do_div() defined as taking (uint64_t, uint32_t) ? > + return runtime; > +} > + > Index: linux-2.6/Documentation/sched-rt-group.txt > === > --- /dev/null > +++ linux-2.6/Documentation/sched-rt-group.txt > @@ -0,0 +1,59 @@ > + > + > +Real-Time group scheduling. > + > +The problem space: > + > +In order to schedule multiple groups of realtime tasks each group must > +be assigned a fixed portion of the cpu time available. Without a minimum Use "cpu" or "CPU" consistently, please. (I prefer CPU, but ) > +guarantee a realtime group can obviously fall short. A fuzzy upper limit > +is of no use since it cannot be relied upon. Which leaves us with just > +the single fixed portion. > + > +CPU time is divided by means of specifying how much time can be spend s/spend/spent/ > +running in a given period. Say a frame fixed realtime renderer must > +deliver a 25 frames a second, which yields a period of 0.04s. Now say drop "a"^ > +it will also have to play some music and respond to input, leaving it > +with around 80% for the graphics. We can then give this group a runtime > +of 0.8 * 0.04s = 0.032s. > + > +This way the graphics group will have a 0.04s period with a 0.032s runtime > +limit. > + > +Now if the audio thread needs to refill the dma buffer every 0.005s, but DMA preferably. > +needs only about 3% cpu time to do so, it will can do with a 0.03 * 0.005s s/will can do/can do/ > += 0.00015s. > + > + > +The Interface: > + > +system wide: > + > +/proc/sys/kernel/sched_rt_period_ms > +/proc/sys/kernel/sched_rt_runtime_us > + > +CONFIG_FAIR_USER_SCHED > + > +/sys/kernel/uids//cpu_rt_runtime_us > + > +or > + > +CONFIG_FAIR_CGROUP_SCHED > + > +/cgroup//cpu.rt_runtime_us > + > +[ time is specified in us because the interface is s32, this gives an s/,/;/ > + operating range of ~35m to 1us ] > + > +The period takes values in [ 1, INT_MAX ], runtime in [ -1, INT_MAX - 1 ]. > + > +A runtime of -1 specifies runtime == period, ie. no limit. > + > +New groups get the period from /proc/sys/kernel/sched_rt_period_us and > +a runtime of 0. > + > +Settings are constrainted to: constrained > + > + \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period > + > +in order to keep the configuration schedulable. --- ~Randy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/8] sched: rt-group: interface
Change the rt_ratio interface to rt_runtime_us, to match rt_period_us. This avoids picking a granularity for the ratio. Extend the /sys/kernel/uids// interface to allow setting the group's rt_runtime. Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> --- Documentation/ABI/testing/sysfs-kernel-uids |6 + Documentation/sched-rt-group.txt| 59 +++ include/linux/sched.h |7 - kernel/sched.c | 145 +--- kernel/sched_rt.c | 53 -- kernel/sysctl.c | 32 +++--- kernel/user.c | 28 + 7 files changed, 250 insertions(+), 80 deletions(-) Index: linux-2.6/include/linux/sched.h === --- linux-2.6.orig/include/linux/sched.h +++ linux-2.6/include/linux/sched.h @@ -1507,8 +1507,6 @@ extern unsigned int sysctl_sched_child_r extern unsigned int sysctl_sched_features; extern unsigned int sysctl_sched_migration_cost; extern unsigned int sysctl_sched_nr_migrate; -extern unsigned int sysctl_sched_rt_period; -extern unsigned int sysctl_sched_rt_ratio; #if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SMP) extern unsigned int sysctl_sched_min_bal_int_shares; extern unsigned int sysctl_sched_max_bal_int_shares; @@ -1518,6 +1516,8 @@ int sched_nr_latency_handler(struct ctl_ struct file *file, void __user *buffer, size_t *length, loff_t *ppos); #endif +extern unsigned int sysctl_sched_rt_period; +extern int sysctl_sched_rt_runtime; extern unsigned int sysctl_sched_compat_yield; @@ -1997,6 +1997,9 @@ extern void sched_destroy_group(struct t extern void sched_move_task(struct task_struct *tsk); extern int sched_group_set_shares(struct task_group *tg, unsigned long shares); extern unsigned long sched_group_shares(struct task_group *tg); +extern int sched_group_set_rt_runtime(struct task_group *tg, + long rt_runtime_us); +extern long sched_group_rt_runtime(struct task_group *tg); #endif Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -176,7 +176,7 @@ struct task_group { struct sched_rt_entity **rt_se; struct rt_rq **rt_rq; - unsigned int rt_ratio; + u64 rt_runtime; /* * shares assigned to a task group governs how much of cpu bandwidth @@ -654,19 +654,21 @@ const_debug unsigned int sysctl_sched_fe const_debug unsigned int sysctl_sched_nr_migrate = 32; /* - * period over which we measure -rt task cpu usage in ms. + * period over which we measure -rt task cpu usage in us. * default: 1s */ -const_debug unsigned int sysctl_sched_rt_period = 1000; +unsigned int sysctl_sched_rt_period = 100; -#define SCHED_RT_FRAC_SHIFT16 -#define SCHED_RT_FRAC (1UL << SCHED_RT_FRAC_SHIFT) +/* + * part of the period that we allow rt tasks to run in us. + * default: 0.95s + */ +int sysctl_sched_rt_runtime = 95; /* - * ratio of time -rt tasks may consume. - * default: 95% + * single value that denotes runtime == period, ie unlimited time. */ -const_debug unsigned int sysctl_sched_rt_ratio = 62259; +#define RUNTIME_INF((u64)~0ULL) /* * For kernel-internal use: high-speed (but slightly incorrect) per-cpu @@ -7191,7 +7193,8 @@ void __init sched_init(void) &per_cpu(init_cfs_rq, i), &per_cpu(init_sched_entity, i), i, 1); - init_task_group.rt_ratio = sysctl_sched_rt_ratio; /* XXX */ + init_task_group.rt_runtime = + sysctl_sched_rt_runtime * NSEC_PER_USEC; INIT_LIST_HEAD(&rq->leaf_rt_rq_list); init_tg_rt_entry(rq, &init_task_group, &per_cpu(init_rt_rq, i), @@ -7586,7 +7589,7 @@ struct task_group *sched_create_group(vo goto err; tg->shares = NICE_0_LOAD; - tg->rt_ratio = 0; /* XXX */ + tg->rt_runtime = 0; for_each_possible_cpu(i) { rq = cpu_rq(i); @@ -7780,30 +7783,76 @@ unsigned long sched_group_shares(struct } /* - * Ensure the total rt_ratio <= sysctl_sched_rt_ratio + * Ensure that the real time constraints are schedulable. */ -int sched_group_set_rt_ratio(struct task_group *tg, unsigned long rt_ratio) +static DEFINE_MUTEX(rt_constraints_mutex); + +static unsigned long to_ratio(u64 period, u64 runtime) +{ + if (runtime == RUNTIME_INF) + return 1ULL << 16; + + runtime *= (1ULL << 16); + do_div(runtime, period); + return runtime; +} + +static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime) { struct task_group *tgi; unsigned long total = 0; + unsigned long global_ratio = +