On Mon, Feb 15, 2016 at 07:48:05PM +0000, Serge Hallyn wrote: > Quoting Christian Brauner (christian.brau...@mailbox.org): > > On Wed, Feb 10, 2016 at 05:45:48PM +0000, Serge Hallyn wrote: > > > Quoting Christian Brauner (christian.brau...@mailbox.org): > > > > On Mon, Feb 01, 2016 at 04:56:08AM +0000, Serge Hallyn wrote: > > > > > Quoting Kevin Wilson (wkev...@gmail.com): > > > > > > Hi, LXC developers, > > > > > > > > > > > > The latest kernel release (4.4) includes initial support to cgroup > > > > > > v2 > > > > > > with 2 controllers (memory and io). Also it seems that the PIDs > > > > > > controller works in cgroup v2, but I do not know if it is officially > > > > > > supported in v2. > > > > > > > > > > > > Is there any intention to replace the existing cgroup v1 usage in > > > > > > LXC > > > > > > by cgroup v2 ? or at least to enable working with both of them ? > > > > > > > > > > > > Regards, > > > > > > Kevin > > > > > > > > > > Replace, no, support, yes. I've added support for it to cgmanager, > > > > > and have > > > > > used lxc with the unified hierarchy through cgmanager. Without > > > > > cgmanager > > > > > it will currently definately not work. It's worth discussing how we > > > > > should > > > > > handle it - and how init wants us to handle it. With cgmanager I > > > > > actually > > > > > built in the support so that you could treat it as a legacy > > > > > hierarchy, and > > > > > upstart was happy with that since it used cgmanager. Systemd will > > > > > not be > > > > > happy with that, and it will be a problem. The only exception to the > > > > > "no > > > > > tasks in a non-leaf node" rule is for the / cgroup. So lxc would > > > > > need to > > > > > place init in say /lxc/c1/.leaf, and systemd would have to accept that > > > > > /lxc/c1 is the container's cgroup. A few possibilities: > > > > > > > > > > 1. maybe if we place systemd in /lxc/c1/init.scope it will be happy > > > > Well, here is how I thought it could go (sticking to systemd specifics > > > > here): > > > > - create a slice for all lxc "lxc.slice" (similar to > > > > "machine.slice" of > > > > systemd-nspawn backed containers) > > > > - "lxc.slice" contains a scope for each container (e.g. > > > > "c1.scope" > > > > - "c1.scope" contains an "init.scope" > > > > - "init.scope" only contains the PID of "/sbin/init" as seen > > > > from the > > > > host (obviously) > > > > > > So if we are creating container c1, are you talking about > > > > > > /lxc/c1/lxc.slice/c1.scope/init.scope > > > > > > or are you talking about a host-global > > > > > > /lxc.slice > > Yes, you have lxc.slice then you have all your machines under this. This is > > what > > systemd-nspawn does if I'm not mistaken. > > > with container-specific > > > > > > /lxc.slice/c1.scope > > > > > > per container? > > > > > > ? > > Yes. > > This doesn't seem to address the problem. Where we put these on the host > doesn't > matter. The question is, we create container c1, in which cgroup do we put > the > init process? > > Assume we create /lxc/c1 on the host as we do now. This becomes / in the > container's > cgroup namespace. Where do we put init? If we put it into (namespaced) /, > then > systemd will not be able to create any cgroups. So we should probably put it > into > /init.scope. This is fine with cgroup namespaces since it can see it is in > '/init.scope' > (or '/' if an unprivileged container couldn't create a cgroup for some > controllers). > But if we do not have cgroup namespaces, systemd sees it is running in perhaps > /user.slice/user-1000.slice/session-c6.scope/lxc/lxdvm1/lxc/c1/init.scope. > In that > case we want systemd to recognize init.scope and create services under > /user.slice/user-1000.slice/session-c6.scope/lxc/lxdvm1/lxc/c1. > > > > > - All other processes are put in another slice > > > > "c1-something.slice" > > > > > > Which other processes? > > Well, all processes, systemd starts are either put in system.slice or > > user.slice. All other things we start in the container (let it be e.g. vim) > > is > > put in a session.slice (e.g. session-0.slice, session-1000.slice). > > wc -l /sys/fs/cgroup/memory/tasks > 548 This is output from a legacy cgroup. (The tasks file is removed in cgroup unified hierarchy, no?) I was talking about unified cgroups.
A typical layout for a container BB running a unified cgroup system inside on a host running a unified cgroup system with systemd-nspawn: /sys/fs/cgroup/machine.slice/: - non-leaf node --> cgroup.procs empty /sys/fs/cgroup/machine.slice/machine-BB\x2dtree.scope/: - non-leaf node --> cgroup.procs empty The following are on the same level: (/sys/fs/cgroup/machine.slice/machine-BB\x2dtree.scope/) - /sys/fs/cgroup/machine.slice/machine-BB\x2dtree.scope/init.scope/: - leaf node --> cgroup.procs contains PID of init - /sys/fs/cgroup/machine.slice/machine-BB\x2dtree.scope/system.slice/: - non-leaf node --> cgroup.procs empty - contains leaf nodes for system setup stuff (journald, logind etc.) - /sys/fs/cgroup/machine.slice/machine-BB\x2dtree.scope/user.slice/user-0.slice/session-c1.scope and - /sys/fs/cgroup/machine.slice/machine-BB\x2dtree.scope/user.slice/user-0.slice/user@0.service: - filled with leaf-nodes for e.g. processes started by the user > > > > AFAIK all other processes will be created by systemd. The q is what will > > > it > > > do. If we put systemd in /lxc.slice/c1.scope/init.scope, will it take > > > that > > > as its cgroup root and try to create and move itself into > > > /lxc.slice/c1.scope/init.scope ? If so it will fail since it cannot > > > create a > > > cgroup while it is in it. > > I don't think so but I need to test that again. Time to boot unified. > > > > > > > > So I think I've convinced myself that we need to collaborate with systemd > > > on this. Perhaps we can agree with it on a default cgroup in which it > > > should > > > be started to tell it "this is the leaf cgroup for your init". So if it > > > sees > > > it is in /a/b/c/.cg_leaf, then it will know that /a/b/c is its root. > > I thought the same that's why I started to read some of the code. > > fwiw, systemd-nspawn already works with the unified cgroup hierarchy and I > > think > > nesting works as well. But I'm not completely sure how nspawn handles > > nesting. > > Looks like it puts systemd into '/supervisor' and the container into > '/payload'? > (nspawn-cgroup.c) I don't think so. This seems to be a special case when systemd-nspawn is run from a service unit. Otherwise the layout seems to be as I sketched above. _______________________________________________ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel