Re: [PATCH 14/14] sched: add sched_dl documentation.

Juri Lelli Fri, 08 Nov 2013 01:26:51 -0800

Hi,

On 11/07/2013 05:44 PM, Randy Dunlap wrote:
> Hi,
> 
> Just a few minor edits...
>


Thanks!

Best,

- Juri

> On 11/07/13 05:47, Juri Lelli wrote:
>> From: Dario Faggioli <raist...@linux.it>
>>
>> Add in Documentation/scheduler/ some hints about the design
>> choices, the usage and the future possible developments of the
>> sched_dl scheduling class and of the SCHED_DEADLINE policy.
>>
>> Signed-off-by: Dario Faggioli <raist...@linux.it>
>> Signed-off-by: Juri Lelli <juri.le...@gmail.com>
>> ---
>>  Documentation/scheduler/sched-deadline.txt |  196 
>> ++++++++++++++++++++++++++++
>>  kernel/sched/deadline.c                    |    3 +-
>>  2 files changed, 198 insertions(+), 1 deletion(-)
>>  create mode 100644 Documentation/scheduler/sched-deadline.txt
>>
>> diff --git a/Documentation/scheduler/sched-deadline.txt 
>> b/Documentation/scheduler/sched-deadline.txt
>> new file mode 100644
>> index 0000000..4d1ed52
>> --- /dev/null
>> +++ b/Documentation/scheduler/sched-deadline.txt
>> @@ -0,0 +1,196 @@
>> +                      Deadline Task Scheduling
>> +                      ------------------------
>> +
>> +CONTENTS
>> +========
>> +
>> +0. WARNING
>> +1. Overview
>> +2. Task scheduling
>> +2. The Interface
>> +3. Bandwidth management
>> +  3.1 System-wide settings
>> +  3.2 Task interface
>> +  3.4 Default behavior
>> +4. Tasks CPU affinity
>> +  4.1 SCHED_DEADLINE and cpusets HOWTO
>> +5. Future plans
>> +
>> +
>> +0. WARNING
>> +==========
>> +
>> + Fiddling with these settings can result in an unpredictable or even 
>> unstable
>> + system behavior. As for -rt (group) scheduling, it is assumed that root 
>> users
>> + know what they're doing.
>> +
>> +
>> +1. Overview
>> +===========
>> +
>> + The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
>> + basically an implementation of the Earliest Deadline First (EDF) scheduling
>> + algorithm, augmented with a mechanism (called Constant Bandwidth Server, 
>> CBS)
>> + that makes it possible to isolate the behavior of tasks between each other.
>> +
>> +
>> +2. Task scheduling
>> +==================
>> +
>> + The typical -deadline task is composed of a computation phase (instance)
>> + which is activated on a periodic or sporadic fashion. The expected 
>> (maximum)
>> + duration of such computation is called the task's runtime; the time 
>> interval
>> + by which each instance needs to be completed is called the task's relative
>> + deadline. The task's absolute deadline is dynamically calculated as the
>> + time instant a task (or, more properly) activates plus the relative
>> + deadline.
>> +
>> + The EDF[1] algorithm selects the task with the smallest absolute deadline 
>> as
>> + the one to be executed first, while the CBS[2,3] ensures that each task 
>> runs
>> + for at most its runtime every period, avoiding any interference between
>> + different tasks (bandwidth isolation).
>> + Thanks to this feature, also tasks that do not strictly comply with the
>> + computational model described above can effectively use the new policy.
>> + IOW, there are no limitations on what kind of task can exploit this new
>> + scheduling discipline, even if it must be said that it is particularly
>> + suited for periodic or sporadic tasks that need guarantees on their
>> + timing behavior, e.g., multimedia, streaming, control applications, etc.
>> +
>> + References:
>> +  1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
>> +      ming in a hard-real-time environment. Journal of the Association for
>> +      Computing Machinery, 20(1), 1973.
>> +  2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
>> +      Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
>> +      Symposium, 1998. 
>> http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
>> +  3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
>> +      Technical Report. http://xoomer.virgilio.it/lucabe72/pubs/tr-98-01.ps
>> +
>> +3. Bandwidth management
>> +=======================
>> +
>> + In order for the -deadline scheduling to be effective and useful, it is
>> + important to have some method to keep the allocation of the available CPU
>> + bandwidth to the tasks under control.
>> + This is usually called "admission control" and if it is not performed at 
>> all,
>> + no guarantee can be given on the actual scheduling of the -deadline tasks.
>> +
>> + Since when RT-throttling has been introduced each task group has a 
>> bandwidth
>> + associated, calculated as a certain amount of runtime over a period.
>> + Moreover, to make it possible to manipulate such bandwidth, 
>> readable/writable
>> + controls have been added to both procfs (for system wide settings) and 
>> cgroupfs
>> + (for per-group settings).
>> + Therefore, the same interface is being used for controlling the bandwidth
>> + distrubution to -deadline tasks and task groups, i.e., new controls but 
>> with
>> + similar names, equivalent meaning and with the same usage paradigm are 
>> added.
>> +
>> + However, more discussion is needed in order to figure out how we want to 
>> manage
>> + SCHED_DEADLINE bandwidth at the task group level. Therefore, SCHED_DEADLINE
>> + uses (for now) a less sophisticated, but actually very sensible, mechanism 
>> to
>> + ensure that a certain utilization cap is not overcome per each root_domain.
>> +
>> + Another main difference between deadline bandwidth management and 
>> RT-throttling
>> + is that -deadline tasks have bandwidth on their own (while -rt ones 
>> don't!),
>> + and thus we don't need an higher level throttling mechanism to enforce the
>> + desired bandwidth.
>> +
>> +3.1 System wide settings
>> +------------------------
>> +
>> + The system wide settings are configured under the /proc virtual file 
>> system.
>> +
>> + The control knob that is added to the /proc virtual file system is
>> + /proc/sys/kernel/sched_dl_runtime_us. It accepts (if written) and provides 
>> (if
>> + read) the new runtime for each CPU in each root_domain. The period control 
>> knob
>> + is instead shared with -rt settings (/proc/sys/kernel/sched_rt_period_us). 
>> +
>> + The CPU bandwidth available to -deadline tasks is actually a sub-quota of
>> + the -rt bandwidth. By default 95% of system bandwidth is allocate to -rt 
>> tasks;
>> + among this, a 40% quota is reserved for -dl tasks. To have the actual 
>> quota a
> 
> s/among/within/
> 
>> + simple multiplication is needed: .95 * .40 = .38 (38% of system bandwidth 
>> for
>> + deadline tasks).
>> +
>> + This means that, for a root_domain comprising M CPUs, -deadline tasks
>> + can be created until the sum of their bandwidths stay below:
> 
>                    while                             stays
> 
>> +
>> +   M * (sched_dl_runtime_us * rt_bw)
>> +
>> + It is also possible to disable this bandwidth management logic, and
>> + be thus free of oversubscribing the system up to any arbitrary level.
>> + This is done by writing -1 in /proc/sys/kernel/sched_dl_runtime_us or
>> + in /proc/sys/kernel/sched_rt_runtime_us.
>> +
>> +
>> +3.2 Task interface
>> +------------------
>> +
>> + Specifying a periodic/sporadic task that executes for a given amount of
>> + runtime at each instance, and that is scheduled according to the urgency of
>> + its own timing constraints needs, in general, a way of declaring:
>> +  - a (maximum/typical) instance execution time,
>> +  - a minimum interval between consecutive instances,
>> +  - a time constraint by which each instance must be completed.
>> +
>> + Therefore:
>> +  * a new struct sched_param2, containing all the necessary fields is
>> +    provided;
>> +  * the new scheduling related syscalls that manipulate it, i.e.,
>> +    sched_setscheduler2(), sched_setparam2() and sched_getparam2()
>> +    are implemented.
>> +
>> +
>> +3.3 Default behavior
>> +---------------------
>> +
>> +The default value for SCHED_DEADLINE bandwidth is to have dl_runtime equal 
>> to
>> +40000. Being rt_period equal to 1000000, by default, it means that -deadline
> 
>           With rt_period equal to 1000000,
> 
>> +tasks can use at most 40%, multiplied by the number of CPUs that compose the
>> +root_domain, for each root_domain.
>> +
>> +A -deadline task cannot fork.
>> +
>> +4. Tasks CPU affinity
>> +=====================
>> +
>> +-deadline tasks cannot have an affinity mask smaller that the entire
>> +root_domain they are created on. However, affinities can be specified
>> +through the cpuset facility (Documentation/cgroups/cpusets.txt).
>> +
>> +4.1 SCHED_DEADLINE and cpusets HOWTO
>> +------------------------------------
>> +
>> +An example of a simple configuration (pin a -deadline task to CPU0)
>> +follows (rt-app is used to create a -deadline task).
>> +
>> +mkdir /dev/cpuset
>> +mount -t cgroup -o cpuset cpuset /dev/cpuset
>> +cd /dev/cpuset
>> +mkdir cpu0
>> +echo 0 > cpu0/cpuset.cpus
>> +echo 0 > cpu0/cpuset.mems
>> +echo 1 > cpuset.cpu_exclusive
>> +echo 0 > cpuset.sched_load_balance
>> +echo 1 > cpu0/cpuset.cpu_exclusive
>> +echo 1 > cpu0/cpuset.mem_exclusive
>> +echo $$ > cpu0/tasks
>> +rt-app -t 100000:10000:d:0 -D5 (it is now actually superfluous to specify
>> +task affinity)
>> +
>> +5. Future plans
>> +===============
>> +
>> +Still missing:
>> +
>> + - refinements to deadline inheritance, especially regarding the possibility
>> +   of retaining bandwidth isolation among non-interacting tasks. This is
>> +   being studied from both theoretical and practical point of views, and
> 
>                                                         points of view,
> 
>> +   hopefully we should be able to produce some demonstrative code soon;
>> + - (c)group based bandwidth management, and maybe scheduling;
>> + - access control for non-root users (and related security concerns to
>> +   address), which is the best way to allow unprivileged use of the 
>> mechanisms
>> +   and how to prevent non-root users "cheat" the system?
>> +
>> +As already discussed, we are planning also to merge this work with the EDF
>> +throttling patches [https://lkml.org/lkml/2010/2/23/239] but we still are in
>> +the preliminary phases of the merge and we really seek feedback that would
>> +help us decide on the direction it should take.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/14] sched: add sched_dl documentation.

Reply via email to