Thanks for posting this patchset Peter. Based on the patch titled, "sched: A quick and dirty cgroup tagging interface," I believe cgroups are used to define co-scheduling groups in this implementation.
Chrome OS engineers (kerr...@google.com, mpden...@google.com, and pal...@google.com) are considering an interface that is usable by unprivileged userspace apps. cgroups are a global resource that require privileged access. Have you considered an interface that is akin to namespaces? Consider the following strawperson API proposal (I understand prctl() is generally used for process specific actions, so we aren't married to using prctl()): # API Properties The kernel introduces coscheduling groups, which specify which processes may be executed together. An unprivileged process may use prctl() to create a coscheduling group. The process may then join the coscheduling group, and place any of its child processes into the coscheduling group. To provide flexibility for unrelated processes to join pre-existing groups, an IPC mechanism could send a coscheduling group handle between processes. # Strawperson API Proposal To create a new coscheduling group: int coscheduling_group = prctl(PR_CREATE_COSCHEDULING_GROUP); The return value is >= 0 on success and -1 on failure, with the following possible values for errno: ENOTSUP: This kernel doesn’t support the PR_NEW_COSCHEDULING_GROUP operation. EMFILE: The process’ kernel-side coscheduling group table is full. To join a given process to the group: pid_t process = /* self or child... */ int status = prctl(PR_JOIN_COSCHEDULING_GROUP, coscheduling_group, process); if (status) { err(errno, NULL); } The kernel will check and enforce that the given process ID really is the caller’s own PID or a PID of one of the caller’s children, and that the given group ID really exists. The return value is 0 on success and -1 on failure, with the following possible values for errno: EPERM: The caller could not join the given process to the coscheduling group because it was not the creator of the given coscheduling group. EPERM: The caller could not join the given process to the coscheduling group because the given process was not the caller or one of the caller’s children. EINVAL: The given group ID did not exist in the kernel-side coscheduling group table associated with the caller. ESRCH: The given process did not exist. Regards, Greg Kerr (kerr...@google.com) On Mon, Feb 18, 2019 at 9:40 AM Peter Zijlstra <pet...@infradead.org> wrote: > > > A much 'demanded' feature: core-scheduling :-( > > I still hate it with a passion, and that is part of why it took a little > longer than 'promised'. > > While this one doesn't have all the 'features' of the previous (never > published) version and isn't L1TF 'complete', I tend to like the structure > better (relatively speaking: I hate it slightly less). > > This one is sched class agnostic and therefore, in principle, doesn't horribly > wreck RT (in fact, RT could 'ab'use this by setting 'task->core_cookie = task' > to force-idle siblings). > > Now, as hinted by that, there are semi sane reasons for actually having this. > Various hardware features like Intel RDT - Memory Bandwidth Allocation, work > per core (due to SMT fundamentally sharing caches) and therefore grouping > related tasks on a core makes it more reliable. > > However; whichever way around you turn this cookie; it is expensive and nasty. > > It doesn't help that there are truly bonghit crazy proposals for using this > out > there, and I really hope to never see them in code. > > These patches are lightly tested and didn't insta explode, but no promises, > they might just set your pets on fire. > > 'enjoy' > > @pjt; I know this isn't quite what we talked about, but this is where I ended > up after I started typing. There's plenty design decisions to question and my > changelogs don't even get close to beginning to cover them all. Feel free to > ask. > > --- > include/linux/sched.h | 9 +- > kernel/Kconfig.preempt | 8 +- > kernel/sched/core.c | 762 > ++++++++++++++++++++++++++++++++++++++++++++--- > kernel/sched/deadline.c | 99 +++--- > kernel/sched/debug.c | 4 +- > kernel/sched/fair.c | 129 +++++--- > kernel/sched/idle.c | 42 ++- > kernel/sched/pelt.h | 2 +- > kernel/sched/rt.c | 96 +++--- > kernel/sched/sched.h | 183 ++++++++---- > kernel/sched/stop_task.c | 35 ++- > kernel/sched/topology.c | 4 +- > kernel/stop_machine.c | 2 + > 13 files changed, 1096 insertions(+), 279 deletions(-) > >