Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 12:26 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
>
>  > In that case I guess I'll have to add signed versions of the
>  > read_uint/write_uint methods.
>
>  Yes, I looked at that, I found the interface somewhat unfortunate, it
>  would mean growing the struct with two more function pointers.

Is that really a big deal? We're talking about a structure that has a
small number (<10 in the current tree) of instances per cgroup
subsystem.

> Perhaps a
>  read and write function with abstract data would be better suited. That
>  would allow for this and more. Sadly it looses type information.

If the size of the struct cftype really became a problem, I think the
cleanest way to fix it would be to have a union of the potential
function pointers, and add a field to specify which one is in use.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Peter Zijlstra

On Sat, 2008-02-23 at 12:02 -0800, Paul Menage wrote:
> On Sat, Feb 23, 2008 at 11:57 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> >  > If so, could we avoid that problem by using 0 rather than -1 as the
> >  > "unlimited" value? It looks from what I've read in the Documentation
> >  > changes as though 0 isn't really a meaningful value.
> >
> >  0 means no time, quite useful and clearly distinct from inf. time.
> >
> 
> So a real-time task in a cgroup with a 0 rt_runtime can be in the R
> state but never actually get to run? OK, if people need to be able to
> do that then fair enough.

Yeah, its an awkward situation, and we refuse new rt tasks in such
groups. But the 0 value is needed so you can have groups that don't
participate in the realtime scheduling because we enforce a
schedulalbility constraint over the groups.

Each group has a runtime ratio, namely: rt_runtime / rt_period.
The sum of this ratio over all groups must be smaller or equal to the
global ratio which must be smaller or equal to 1.

> In that case I guess I'll have to add signed versions of the
> read_uint/write_uint methods.

Yes, I looked at that, I found the interface somewhat unfortunate, it
would mean growing the struct with two more function pointers. Perhaps a
read and write function with abstract data would be better suited. That
would allow for this and more. Sadly it looses type information.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Paul Menage
On Sat, Feb 23, 2008 at 11:57 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
>  > If so, could we avoid that problem by using 0 rather than -1 as the
>  > "unlimited" value? It looks from what I've read in the Documentation
>  > changes as though 0 isn't really a meaningful value.
>
>  0 means no time, quite useful and clearly distinct from inf. time.
>

So a real-time task in a cgroup with a 0 rt_runtime can be in the R
state but never actually get to run? OK, if people need to be able to
do that then fair enough.

In that case I guess I'll have to add signed versions of the
read_uint/write_uint methods.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Peter Zijlstra

On Sat, 2008-02-23 at 11:48 -0800, Paul Menage wrote:
> On Mon, Feb 4, 2008 at 1:03 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> >  +static int cpu_rt_runtime_write(struct cgroup *cgrp, struct cftype *cft,
> >  +   struct file *file,
> >  +   const char __user *userbuf,
> >  +   size_t nbytes, loff_t *unused_ppos)
> >  +{
> >  +   char buffer[64];
> >  +   int retval = 0;
> >  +   s64 val;
> >  +   char *end;
> >  +
> >  +   if (!nbytes)
> >  +   return -EINVAL;
> >  +   if (nbytes >= sizeof(buffer))
> >  +   return -E2BIG;
> >  +   if (copy_from_user(buffer, userbuf, nbytes))
> >  +   return -EFAULT;
> >  +
> >  +   buffer[nbytes] = 0; /* nul-terminate */
> >  +
> >  +   /* strip newline if necessary */
> >  +   if (nbytes && (buffer[nbytes-1] == '\n'))
> >  +   buffer[nbytes-1] = 0;
> >  +   val = simple_strtoll(buffer, &end, 0);
> >  +   if (*end)
> >  +   return -EINVAL;
> >  +
> >  +   /* Pass to subsystem */
> >  +   retval = sched_group_set_rt_runtime(cgroup_tg(cgrp), val);
> >  +   if (!retval)
> >  +   retval = nbytes;
> >  +   return retval;
> >   }
> >
> >  -static u64 cpu_rt_ratio_read_uint(struct cgroup *cgrp, struct cftype *cft)
> >  -{
> >  -   struct task_group *tg = cgroup_tg(cgrp);
> >  +static ssize_t cpu_rt_runtime_read(struct cgroup *cgrp, struct cftype 
> > *cft,
> >  +  struct file *file,
> >  +  char __user *buf, size_t nbytes,
> >  +  loff_t *ppos)
> >  +{
> >  +   char tmp[64];
> >  +   long val = sched_group_rt_runtime(cgroup_tg(cgrp));
> >  +   int len = sprintf(tmp, "%ld\n", val);
> >
> >  -   return (u64) tg->rt_ratio;
> >  +   return simple_read_from_buffer(buf, nbytes, ppos, tmp, len);
> >   }
> 
> What's the reason that you can't use the cgroup read_uint/write_uint
> methods for this? Is it just because you have -1 as your "unlimited"
> value.

Yes.

> If so, could we avoid that problem by using 0 rather than -1 as the
> "unlimited" value? It looks from what I've read in the Documentation
> changes as though 0 isn't really a meaningful value.

0 means no time, quite useful and clearly distinct from inf. time.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-23 Thread Paul Menage
On Mon, Feb 4, 2008 at 1:03 PM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
>  +static int cpu_rt_runtime_write(struct cgroup *cgrp, struct cftype *cft,
>  +   struct file *file,
>  +   const char __user *userbuf,
>  +   size_t nbytes, loff_t *unused_ppos)
>  +{
>  +   char buffer[64];
>  +   int retval = 0;
>  +   s64 val;
>  +   char *end;
>  +
>  +   if (!nbytes)
>  +   return -EINVAL;
>  +   if (nbytes >= sizeof(buffer))
>  +   return -E2BIG;
>  +   if (copy_from_user(buffer, userbuf, nbytes))
>  +   return -EFAULT;
>  +
>  +   buffer[nbytes] = 0; /* nul-terminate */
>  +
>  +   /* strip newline if necessary */
>  +   if (nbytes && (buffer[nbytes-1] == '\n'))
>  +   buffer[nbytes-1] = 0;
>  +   val = simple_strtoll(buffer, &end, 0);
>  +   if (*end)
>  +   return -EINVAL;
>  +
>  +   /* Pass to subsystem */
>  +   retval = sched_group_set_rt_runtime(cgroup_tg(cgrp), val);
>  +   if (!retval)
>  +   retval = nbytes;
>  +   return retval;
>   }
>
>  -static u64 cpu_rt_ratio_read_uint(struct cgroup *cgrp, struct cftype *cft)
>  -{
>  -   struct task_group *tg = cgroup_tg(cgrp);
>  +static ssize_t cpu_rt_runtime_read(struct cgroup *cgrp, struct cftype *cft,
>  +  struct file *file,
>  +  char __user *buf, size_t nbytes,
>  +  loff_t *ppos)
>  +{
>  +   char tmp[64];
>  +   long val = sched_group_rt_runtime(cgroup_tg(cgrp));
>  +   int len = sprintf(tmp, "%ld\n", val);
>
>  -   return (u64) tg->rt_ratio;
>  +   return simple_read_from_buffer(buf, nbytes, ppos, tmp, len);
>   }

What's the reason that you can't use the cgroup read_uint/write_uint
methods for this? Is it just because you have -1 as your "unlimited"
value.

If so, could we avoid that problem by using 0 rather than -1 as the
"unlimited" value? It looks from what I've read in the Documentation
changes as though 0 isn't really a meaningful value.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: rt-group: interface

2008-02-05 Thread Randy Dunlap
On Mon, 04 Feb 2008 22:03:01 +0100 Peter Zijlstra wrote:

> Change the rt_ratio interface to rt_runtime_us, to match rt_period_us.
> This avoids picking a granularity for the ratio.
> 
> Extend the /sys/kernel/uids// interface to allow setting
> the group's rt_runtime.
> 
> Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
> ---
>  Documentation/ABI/testing/sysfs-kernel-uids |6 +
>  Documentation/sched-rt-group.txt|   59 +++
>  include/linux/sched.h   |7 -
>  kernel/sched.c  |  145 
> +---
>  kernel/sched_rt.c   |   53 --
>  kernel/sysctl.c |   32 +++---
>  kernel/user.c   |   28 +
>  7 files changed, 250 insertions(+), 80 deletions(-)

> Index: linux-2.6/kernel/sched.c
> ===
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -7780,30 +7783,76 @@ unsigned long sched_group_shares(struct 
>  }
>  
>  /*
> - * Ensure the total rt_ratio <= sysctl_sched_rt_ratio
> + * Ensure that the real time constraints are schedulable.
>   */
> -int sched_group_set_rt_ratio(struct task_group *tg, unsigned long rt_ratio)
> +static DEFINE_MUTEX(rt_constraints_mutex);
> +
> +static unsigned long to_ratio(u64 period, u64 runtime)
> +{
> + if (runtime == RUNTIME_INF)
> + return 1ULL << 16;
> +
> + runtime *= (1ULL << 16);
> + do_div(runtime, period);

Isn't do_div() defined as taking (uint64_t, uint32_t) ?

> + return runtime;
> +}
> +

> Index: linux-2.6/Documentation/sched-rt-group.txt
> ===
> --- /dev/null
> +++ linux-2.6/Documentation/sched-rt-group.txt
> @@ -0,0 +1,59 @@
> +
> +
> +Real-Time group scheduling.
> +
> +The problem space:
> +
> +In order to schedule multiple groups of realtime tasks each group must
> +be assigned a fixed portion of the cpu time available. Without a minimum

Use "cpu" or "CPU" consistently, please.  (I prefer CPU, but )

> +guarantee a realtime group can obviously fall short. A fuzzy upper limit
> +is of no use since it cannot be relied upon. Which leaves us with just
> +the single fixed portion.
> +
> +CPU time is divided by means of specifying how much time can be spend

s/spend/spent/

> +running in a given period. Say a frame fixed realtime renderer must
> +deliver a 25 frames a second, which yields a period of 0.04s. Now say

   drop "a"^

> +it will also have to play some music and respond to input, leaving it
> +with around 80% for the graphics. We can then give this group a runtime
> +of 0.8 * 0.04s = 0.032s.
> +
> +This way the graphics group will have a 0.04s period with a 0.032s runtime
> +limit.
> +
> +Now if the audio thread needs to refill the dma buffer every 0.005s, but

DMA preferably.

> +needs only about 3% cpu time to do so, it will can do with a 0.03 * 0.005s

   s/will can do/can do/

> += 0.00015s.
> +
> +
> +The Interface:
> +
> +system wide:
> +
> +/proc/sys/kernel/sched_rt_period_ms
> +/proc/sys/kernel/sched_rt_runtime_us
> +
> +CONFIG_FAIR_USER_SCHED
> +
> +/sys/kernel/uids//cpu_rt_runtime_us
> +
> +or
> +
> +CONFIG_FAIR_CGROUP_SCHED
> +
> +/cgroup//cpu.rt_runtime_us
> +
> +[ time is specified in us because the interface is s32, this gives an

s/,/;/

> +  operating range of ~35m to 1us ]
> +
> +The period takes values in [ 1, INT_MAX ], runtime in [ -1, INT_MAX - 1 ].
> +
> +A runtime of -1 specifies runtime == period, ie. no limit.
> +
> +New groups get the period from /proc/sys/kernel/sched_rt_period_us and
> +a runtime of 0.
> +
> +Settings are constrainted to:

constrained

> +
> +   \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period
> +
> +in order to keep the configuration schedulable.

---
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/8] sched: rt-group: interface

2008-02-04 Thread Peter Zijlstra
Change the rt_ratio interface to rt_runtime_us, to match rt_period_us.
This avoids picking a granularity for the ratio.

Extend the /sys/kernel/uids// interface to allow setting
the group's rt_runtime.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 Documentation/ABI/testing/sysfs-kernel-uids |6 +
 Documentation/sched-rt-group.txt|   59 +++
 include/linux/sched.h   |7 -
 kernel/sched.c  |  145 +---
 kernel/sched_rt.c   |   53 --
 kernel/sysctl.c |   32 +++---
 kernel/user.c   |   28 +
 7 files changed, 250 insertions(+), 80 deletions(-)

Index: linux-2.6/include/linux/sched.h
===
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1507,8 +1507,6 @@ extern unsigned int sysctl_sched_child_r
 extern unsigned int sysctl_sched_features;
 extern unsigned int sysctl_sched_migration_cost;
 extern unsigned int sysctl_sched_nr_migrate;
-extern unsigned int sysctl_sched_rt_period;
-extern unsigned int sysctl_sched_rt_ratio;
 #if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SMP)
 extern unsigned int sysctl_sched_min_bal_int_shares;
 extern unsigned int sysctl_sched_max_bal_int_shares;
@@ -1518,6 +1516,8 @@ int sched_nr_latency_handler(struct ctl_
struct file *file, void __user *buffer, size_t *length,
loff_t *ppos);
 #endif
+extern unsigned int sysctl_sched_rt_period;
+extern int sysctl_sched_rt_runtime;
 
 extern unsigned int sysctl_sched_compat_yield;
 
@@ -1997,6 +1997,9 @@ extern void sched_destroy_group(struct t
 extern void sched_move_task(struct task_struct *tsk);
 extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);
 extern unsigned long sched_group_shares(struct task_group *tg);
+extern int sched_group_set_rt_runtime(struct task_group *tg,
+ long rt_runtime_us);
+extern long sched_group_rt_runtime(struct task_group *tg);
 
 #endif
 
Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -176,7 +176,7 @@ struct task_group {
struct sched_rt_entity **rt_se;
struct rt_rq **rt_rq;
 
-   unsigned int rt_ratio;
+   u64 rt_runtime;
 
/*
 * shares assigned to a task group governs how much of cpu bandwidth
@@ -654,19 +654,21 @@ const_debug unsigned int sysctl_sched_fe
 const_debug unsigned int sysctl_sched_nr_migrate = 32;
 
 /*
- * period over which we measure -rt task cpu usage in ms.
+ * period over which we measure -rt task cpu usage in us.
  * default: 1s
  */
-const_debug unsigned int sysctl_sched_rt_period = 1000;
+unsigned int sysctl_sched_rt_period = 100;
 
-#define SCHED_RT_FRAC_SHIFT16
-#define SCHED_RT_FRAC  (1UL << SCHED_RT_FRAC_SHIFT)
+/*
+ * part of the period that we allow rt tasks to run in us.
+ * default: 0.95s
+ */
+int sysctl_sched_rt_runtime = 95;
 
 /*
- * ratio of time -rt tasks may consume.
- * default: 95%
+ * single value that denotes runtime == period, ie unlimited time.
  */
-const_debug unsigned int sysctl_sched_rt_ratio = 62259;
+#define RUNTIME_INF((u64)~0ULL)
 
 /*
  * For kernel-internal use: high-speed (but slightly incorrect) per-cpu
@@ -7191,7 +7193,8 @@ void __init sched_init(void)
&per_cpu(init_cfs_rq, i),
&per_cpu(init_sched_entity, i), i, 1);
 
-   init_task_group.rt_ratio = sysctl_sched_rt_ratio; /* XXX */
+   init_task_group.rt_runtime =
+   sysctl_sched_rt_runtime * NSEC_PER_USEC;
INIT_LIST_HEAD(&rq->leaf_rt_rq_list);
init_tg_rt_entry(rq, &init_task_group,
&per_cpu(init_rt_rq, i),
@@ -7586,7 +7589,7 @@ struct task_group *sched_create_group(vo
goto err;
 
tg->shares = NICE_0_LOAD;
-   tg->rt_ratio = 0; /* XXX */
+   tg->rt_runtime = 0;
 
for_each_possible_cpu(i) {
rq = cpu_rq(i);
@@ -7780,30 +7783,76 @@ unsigned long sched_group_shares(struct 
 }
 
 /*
- * Ensure the total rt_ratio <= sysctl_sched_rt_ratio
+ * Ensure that the real time constraints are schedulable.
  */
-int sched_group_set_rt_ratio(struct task_group *tg, unsigned long rt_ratio)
+static DEFINE_MUTEX(rt_constraints_mutex);
+
+static unsigned long to_ratio(u64 period, u64 runtime)
+{
+   if (runtime == RUNTIME_INF)
+   return 1ULL << 16;
+
+   runtime *= (1ULL << 16);
+   do_div(runtime, period);
+   return runtime;
+}
+
+static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime)
 {
struct task_group *tgi;
unsigned long total = 0;
+   unsigned long global_ratio =
+