Other idea is - maybe we can not fake cpuset cgroup, but just allow these controller in container?

Main Idea of faking/hiding cpuset was: cpuset is not virtuallized(we don't have virtual processors) so container can bind itself to physical cpus and memory nodes. If several containers bind to same cpu they will end up competing for these cpu resources, it can influence performance badly. https://jira.sw.ru/browse/PSBM-30541

But AFAIKS performance is degraded only for containers which setup cpuset badly, all others are still scheduled on all cores and are fine. So we protect customers from themselves.

We can even add a feature to enable/disable cpuset per CT, e.g. vzctl sets ve.cpuset_enabled in ve cgroup before it's start, and after that from ve cgroup ctinit mounts cpuset in CT if it is listed in /proc/cgroups. Note we also need to do the same on criu restore.

On 12/13/2017 07:52 PM, Stanislav Kinsburskiy wrote:
Any changes to this cgroup are skipped in container, but success code is
returned.
The idea is to fool Docker/Kubernetes.

https://jira.sw.ru/browse/PSBM-58423

This patch obsoletes "ve/proc/cpuset: do not show cpuset in CT"

v2:
Do not attach tasks in cpuset_change_cpumask on cpuset set change, it
requested from non-super VE.
This is a second part of the logic.
The first was to not change cpuset for newly added task. This one - to not
set new cpuset for all the tasks in cgroup

Signed-off-by: Stanislav Kinsburskiy <skinsbur...@virtuozzo.com>
---
  kernel/cpuset.c |   12 ++++++++++++
  1 file changed, 12 insertions(+)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 26d88eb..43b1410 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -848,6 +848,9 @@ static int cpuset_test_cpumask(struct task_struct *tsk,
  static void cpuset_change_cpumask(struct task_struct *tsk,
                                  struct cgroup_scanner *scan)
  {
+       if (!ve_is_super(get_exec_env()))
+               return;
+

Likely we have to do the same for nodemask too if we choose to fake cpuset cgroup, and maybe some others:

ls /sys/fs/cgroup/cpuset/cpuset.*
/sys/fs/cgroup/cpuset/cpuset.cpu_exclusive
/sys/fs/cgroup/cpuset/cpuset.cpus
/sys/fs/cgroup/cpuset/cpuset.mem_exclusive
/sys/fs/cgroup/cpuset/cpuset.mem_hardwall
/sys/fs/cgroup/cpuset/cpuset.memory_migrate
/sys/fs/cgroup/cpuset/cpuset.memory_pressure
/sys/fs/cgroup/cpuset/cpuset.memory_pressure_enabled
/sys/fs/cgroup/cpuset/cpuset.memory_spread_page
/sys/fs/cgroup/cpuset/cpuset.memory_spread_slab
/sys/fs/cgroup/cpuset/cpuset.mems
/sys/fs/cgroup/cpuset/cpuset.sched_load_balance
/sys/fs/cgroup/cpuset/cpuset.sched_relax_domain_leve

        set_cpus_allowed_ptr(tsk, ((cgroup_cs(scan->cg))->cpus_allowed));
  }
@@ -1441,6 +1444,9 @@ static int cpuset_can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)
        struct task_struct *task;
        int ret;
+ if (!ve_is_super(get_exec_env()))
+               return 0;
+
        mutex_lock(&cpuset_mutex);
ret = -ENOSPC;
@@ -1470,6 +1476,9 @@ static int cpuset_can_attach(struct cgroup *cgrp, struct 
cgroup_taskset *tset)
  static void cpuset_cancel_attach(struct cgroup *cgrp,
                                 struct cgroup_taskset *tset)
  {
+       if (!ve_is_super(get_exec_env()))
+               return;
+
        mutex_lock(&cpuset_mutex);
        cgroup_cs(cgrp)->attach_in_progress--;
        mutex_unlock(&cpuset_mutex);
@@ -1494,6 +1503,9 @@ static void cpuset_attach(struct cgroup *cgrp, struct 
cgroup_taskset *tset)
        struct cpuset *cs = cgroup_cs(cgrp);
        struct cpuset *oldcs = cgroup_cs(oldcgrp);
+ if (!ve_is_super(get_exec_env()))
+               return;
+
        mutex_lock(&cpuset_mutex);
/* prepare for attach */


--
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.
_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to