Re: [lxc-devel] cgroup V2 and LXC

2016-02-24 Thread Serge Hallyn
Quoting Christian Brauner (christianvanbrau...@gmail.com):
> On Mon, Feb 15, 2016 at 07:48:05PM +, Serge Hallyn wrote:
> > Quoting Christian Brauner (christian.brau...@mailbox.org):
> > > On Wed, Feb 10, 2016 at 05:45:48PM +, Serge Hallyn wrote:
> > > > Quoting Christian Brauner (christian.brau...@mailbox.org):
> > > > > On Mon, Feb 01, 2016 at 04:56:08AM +, Serge Hallyn wrote:
> > > > > > Quoting Kevin Wilson (wkev...@gmail.com):
> > > > > > > Hi, LXC developers,
> > > > > > > 
> > > > > > > The latest kernel release (4.4) includes initial support to 
> > > > > > > cgroup v2
> > > > > > > with 2 controllers (memory and io). Also it seems that the PIDs
> > > > > > > controller works in cgroup v2, but I do not know if it is 
> > > > > > > officially
> > > > > > > supported in v2.
> > > > > > > 
> > > > > > > Is there any intention to replace the existing cgroup v1 usage in 
> > > > > > > LXC
> > > > > > > by cgroup v2 ? or at least to enable working with both of them ?
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Kevin
> > > > > > 
> > > > > > Replace, no, support, yes.  I've added support for it to cgmanager, 
> > > > > > and have
> > > > > > used lxc with the unified hierarchy through cgmanager.  Without 
> > > > > > cgmanager
> > > > > > it will currently definately not work.  It's worth discussing how 
> > > > > > we should
> > > > > > handle it - and how init wants us to handle it.   With cgmanager I 
> > > > > > actually
> > > > > > built in the support so that you could treat it as a legacy 
> > > > > > hierarchy, and
> > > > > > upstart was happy with that since it used cgmanager.  Systemd will 
> > > > > > not be
> > > > > > happy with that, and it will be a problem.  The only exception to 
> > > > > > the "no
> > > > > > tasks in a non-leaf node" rule is for the / cgroup.  So lxc would 
> > > > > > need to
> > > > > > place init in say /lxc/c1/.leaf, and systemd would have to accept 
> > > > > > that
> > > > > > /lxc/c1 is the container's cgroup.  A few possibilities:
> > > > > > 
> > > > > > 1. maybe if we place systemd in /lxc/c1/init.scope it will be happy
> > > > > Well, here is how I thought it could go (sticking to systemd 
> > > > > specifics here):
> > > > > - create a slice for all lxc "lxc.slice" (similar to 
> > > > > "machine.slice" of
> > > > >   systemd-nspawn backed containers)
> > > > > - "lxc.slice" contains a scope for each container (e.g. 
> > > > > "c1.scope"
> > > > > - "c1.scope" contains an "init.scope"
> > > > > - "init.scope" only contains the PID of "/sbin/init" as seen 
> > > > > from the
> > > > >   host (obviously)
> > > > 
> > > > So if we are creating container c1, are you talking about
> > > > 
> > > > /lxc/c1/lxc.slice/c1.scope/init.scope
> > > > 
> > > > or are you talking about a host-global
> > > > 
> > > > /lxc.slice
> > > Yes, you have lxc.slice then you have all your machines under this. This 
> > > is what
> > > systemd-nspawn does if I'm not mistaken.
> > > > with container-specific
> > > > 
> > > > /lxc.slice/c1.scope
> > > > 
> > > > per container?
> > > > 
> > > > ?
> > > Yes.
> > 
> > This doesn't seem to address the problem.  Where we put these on the host 
> > doesn't
> > matter.  The question is, we create container c1, in which cgroup do we put 
> > the
> > init process?
> > 
> > Assume we create /lxc/c1 on the host as we do now.  This becomes / in the 
> > container's
> > cgroup namespace.  Where do we put init?  If we put it into (namespaced) /, 
> > then
> > systemd will not be able to create any cgroups.  So we should probably put 
> > it into
> > /init.scope.  This is fine with cgroup namespaces since it can see it is in 
> > '/init.scope'
> > (or '/' if an unprivileged container couldn't create a cgroup for some 
> > controllers).
> > But if we do not have cgroup namespaces, systemd sees it is running in 
> > perhaps
> > /user.slice/user-1000.slice/session-c6.scope/lxc/lxdvm1/lxc/c1/init.scope.  
> > In that
> > case we want systemd to recognize init.scope and create services under
> > /user.slice/user-1000.slice/session-c6.scope/lxc/lxdvm1/lxc/c1.
> > 
> > > > > - All other processes are put in another slice 
> > > > > "c1-something.slice"
> > > > 
> > > > Which other processes?
> > > Well, all processes, systemd starts are either put in system.slice or
> > > user.slice. All other things we start in the container (let it be e.g. 
> > > vim) is
> > > put in a session.slice (e.g. session-0.slice, session-1000.slice).
> > 
> > wc -l /sys/fs/cgroup/memory/tasks
> > 548
> This is output from a legacy cgroup. (The tasks file is removed in cgroup
> unified hierarchy, no?) I was talking about unified cgroups.

Oh, of course.

> A typical layout for a container BB running a unified cgroup system inside on 
> a
> host running a unified cgroup system with systemd-nspawn:
> 
> /sys/fs/cgroup/machine.slice/:
> - non-leaf node --> cgroup.procs empty
> 
> 

Re: [lxc-devel] cgroup V2 and LXC

2016-02-23 Thread Christian Brauner
On Mon, Feb 15, 2016 at 07:48:05PM +, Serge Hallyn wrote:
> Quoting Christian Brauner (christian.brau...@mailbox.org):
> > On Wed, Feb 10, 2016 at 05:45:48PM +, Serge Hallyn wrote:
> > > Quoting Christian Brauner (christian.brau...@mailbox.org):
> > > > On Mon, Feb 01, 2016 at 04:56:08AM +, Serge Hallyn wrote:
> > > > > Quoting Kevin Wilson (wkev...@gmail.com):
> > > > > > Hi, LXC developers,
> > > > > > 
> > > > > > The latest kernel release (4.4) includes initial support to cgroup 
> > > > > > v2
> > > > > > with 2 controllers (memory and io). Also it seems that the PIDs
> > > > > > controller works in cgroup v2, but I do not know if it is officially
> > > > > > supported in v2.
> > > > > > 
> > > > > > Is there any intention to replace the existing cgroup v1 usage in 
> > > > > > LXC
> > > > > > by cgroup v2 ? or at least to enable working with both of them ?
> > > > > > 
> > > > > > Regards,
> > > > > > Kevin
> > > > > 
> > > > > Replace, no, support, yes.  I've added support for it to cgmanager, 
> > > > > and have
> > > > > used lxc with the unified hierarchy through cgmanager.  Without 
> > > > > cgmanager
> > > > > it will currently definately not work.  It's worth discussing how we 
> > > > > should
> > > > > handle it - and how init wants us to handle it.   With cgmanager I 
> > > > > actually
> > > > > built in the support so that you could treat it as a legacy 
> > > > > hierarchy, and
> > > > > upstart was happy with that since it used cgmanager.  Systemd will 
> > > > > not be
> > > > > happy with that, and it will be a problem.  The only exception to the 
> > > > > "no
> > > > > tasks in a non-leaf node" rule is for the / cgroup.  So lxc would 
> > > > > need to
> > > > > place init in say /lxc/c1/.leaf, and systemd would have to accept that
> > > > > /lxc/c1 is the container's cgroup.  A few possibilities:
> > > > > 
> > > > > 1. maybe if we place systemd in /lxc/c1/init.scope it will be happy
> > > > Well, here is how I thought it could go (sticking to systemd specifics 
> > > > here):
> > > > - create a slice for all lxc "lxc.slice" (similar to 
> > > > "machine.slice" of
> > > >   systemd-nspawn backed containers)
> > > > - "lxc.slice" contains a scope for each container (e.g. 
> > > > "c1.scope"
> > > > - "c1.scope" contains an "init.scope"
> > > > - "init.scope" only contains the PID of "/sbin/init" as seen 
> > > > from the
> > > >   host (obviously)
> > > 
> > > So if we are creating container c1, are you talking about
> > > 
> > > /lxc/c1/lxc.slice/c1.scope/init.scope
> > > 
> > > or are you talking about a host-global
> > > 
> > > /lxc.slice
> > Yes, you have lxc.slice then you have all your machines under this. This is 
> > what
> > systemd-nspawn does if I'm not mistaken.
> > > with container-specific
> > > 
> > > /lxc.slice/c1.scope
> > > 
> > > per container?
> > > 
> > > ?
> > Yes.
> 
> This doesn't seem to address the problem.  Where we put these on the host 
> doesn't
> matter.  The question is, we create container c1, in which cgroup do we put 
> the
> init process?
> 
> Assume we create /lxc/c1 on the host as we do now.  This becomes / in the 
> container's
> cgroup namespace.  Where do we put init?  If we put it into (namespaced) /, 
> then
> systemd will not be able to create any cgroups.  So we should probably put it 
> into
> /init.scope.  This is fine with cgroup namespaces since it can see it is in 
> '/init.scope'
> (or '/' if an unprivileged container couldn't create a cgroup for some 
> controllers).
> But if we do not have cgroup namespaces, systemd sees it is running in perhaps
> /user.slice/user-1000.slice/session-c6.scope/lxc/lxdvm1/lxc/c1/init.scope.  
> In that
> case we want systemd to recognize init.scope and create services under
> /user.slice/user-1000.slice/session-c6.scope/lxc/lxdvm1/lxc/c1.
> 
> > > > - All other processes are put in another slice 
> > > > "c1-something.slice"
> > > 
> > > Which other processes?
> > Well, all processes, systemd starts are either put in system.slice or
> > user.slice. All other things we start in the container (let it be e.g. vim) 
> > is
> > put in a session.slice (e.g. session-0.slice, session-1000.slice).
> 
> wc -l /sys/fs/cgroup/memory/tasks
> 548
This is output from a legacy cgroup. (The tasks file is removed in cgroup
unified hierarchy, no?) I was talking about unified cgroups.

A typical layout for a container BB running a unified cgroup system inside on a
host running a unified cgroup system with systemd-nspawn:

/sys/fs/cgroup/machine.slice/:
- non-leaf node --> cgroup.procs empty

/sys/fs/cgroup/machine.slice/machine-BB\x2dtree.scope/:
- non-leaf node --> cgroup.procs empty

The following are on the same level: 
(/sys/fs/cgroup/machine.slice/machine-BB\x2dtree.scope/)

- /sys/fs/cgroup/machine.slice/machine-BB\x2dtree.scope/init.scope/:
- leaf node --> cgroup.procs contains PID of init

- 

Re: [lxc-devel] cgroup V2 and LXC

2016-02-15 Thread Serge Hallyn
Quoting Christian Brauner (christian.brau...@mailbox.org):
> On Wed, Feb 10, 2016 at 05:45:48PM +, Serge Hallyn wrote:
> > Quoting Christian Brauner (christian.brau...@mailbox.org):
> > > On Mon, Feb 01, 2016 at 04:56:08AM +, Serge Hallyn wrote:
> > > > Quoting Kevin Wilson (wkev...@gmail.com):
> > > > > Hi, LXC developers,
> > > > > 
> > > > > The latest kernel release (4.4) includes initial support to cgroup v2
> > > > > with 2 controllers (memory and io). Also it seems that the PIDs
> > > > > controller works in cgroup v2, but I do not know if it is officially
> > > > > supported in v2.
> > > > > 
> > > > > Is there any intention to replace the existing cgroup v1 usage in LXC
> > > > > by cgroup v2 ? or at least to enable working with both of them ?
> > > > > 
> > > > > Regards,
> > > > > Kevin
> > > > 
> > > > Replace, no, support, yes.  I've added support for it to cgmanager, and 
> > > > have
> > > > used lxc with the unified hierarchy through cgmanager.  Without 
> > > > cgmanager
> > > > it will currently definately not work.  It's worth discussing how we 
> > > > should
> > > > handle it - and how init wants us to handle it.   With cgmanager I 
> > > > actually
> > > > built in the support so that you could treat it as a legacy hierarchy, 
> > > > and
> > > > upstart was happy with that since it used cgmanager.  Systemd will not 
> > > > be
> > > > happy with that, and it will be a problem.  The only exception to the 
> > > > "no
> > > > tasks in a non-leaf node" rule is for the / cgroup.  So lxc would need 
> > > > to
> > > > place init in say /lxc/c1/.leaf, and systemd would have to accept that
> > > > /lxc/c1 is the container's cgroup.  A few possibilities:
> > > > 
> > > > 1. maybe if we place systemd in /lxc/c1/init.scope it will be happy
> > > Well, here is how I thought it could go (sticking to systemd specifics 
> > > here):
> > > - create a slice for all lxc "lxc.slice" (similar to 
> > > "machine.slice" of
> > >   systemd-nspawn backed containers)
> > > - "lxc.slice" contains a scope for each container (e.g. "c1.scope"
> > > - "c1.scope" contains an "init.scope"
> > > - "init.scope" only contains the PID of "/sbin/init" as seen from 
> > > the
> > >   host (obviously)
> > 
> > So if we are creating container c1, are you talking about
> > 
> > /lxc/c1/lxc.slice/c1.scope/init.scope
> > 
> > or are you talking about a host-global
> > 
> > /lxc.slice
> Yes, you have lxc.slice then you have all your machines under this. This is 
> what
> systemd-nspawn does if I'm not mistaken.
> > with container-specific
> > 
> > /lxc.slice/c1.scope
> > 
> > per container?
> > 
> > ?
> Yes.

This doesn't seem to address the problem.  Where we put these on the host 
doesn't
matter.  The question is, we create container c1, in which cgroup do we put the
init process?

Assume we create /lxc/c1 on the host as we do now.  This becomes / in the 
container's
cgroup namespace.  Where do we put init?  If we put it into (namespaced) /, then
systemd will not be able to create any cgroups.  So we should probably put it 
into
/init.scope.  This is fine with cgroup namespaces since it can see it is in 
'/init.scope'
(or '/' if an unprivileged container couldn't create a cgroup for some 
controllers).
But if we do not have cgroup namespaces, systemd sees it is running in perhaps
/user.slice/user-1000.slice/session-c6.scope/lxc/lxdvm1/lxc/c1/init.scope.  In 
that
case we want systemd to recognize init.scope and create services under
/user.slice/user-1000.slice/session-c6.scope/lxc/lxdvm1/lxc/c1.

> > > - All other processes are put in another slice 
> > > "c1-something.slice"
> > 
> > Which other processes?
> Well, all processes, systemd starts are either put in system.slice or
> user.slice. All other things we start in the container (let it be e.g. vim) is
> put in a session.slice (e.g. session-0.slice, session-1000.slice).

wc -l /sys/fs/cgroup/memory/tasks
548

> > AFAIK all other processes will be created by systemd.  The q is what will it
> > do.  If we put systemd in /lxc.slice/c1.scope/init.scope, will it take that
> > as its cgroup root and try to create and move itself into
> > /lxc.slice/c1.scope/init.scope ?  If so it will fail since it cannot create 
> > a
> > cgroup while it is in it.
> I don't think so but I need to test that again. Time to boot unified.
> 
> > 
> > So I think I've convinced myself that we need to collaborate with systemd
> > on this.  Perhaps we can agree with it on a default cgroup in which it 
> > should
> > be started to tell it "this is the leaf cgroup for your init".  So if it 
> > sees
> > it is in /a/b/c/.cg_leaf, then it will know that /a/b/c is its root.
> I thought the same that's why I started to read some of the code.
> fwiw, systemd-nspawn already works with the unified cgroup hierarchy and I 
> think
> nesting works as well. But I'm not completely sure how nspawn handles nesting.

Looks like it puts systemd into 

Re: [lxc-devel] cgroup V2 and LXC

2016-02-10 Thread Serge Hallyn
Quoting Christian Brauner (christian.brau...@mailbox.org):
> On Mon, Feb 01, 2016 at 04:56:08AM +, Serge Hallyn wrote:
> > Quoting Kevin Wilson (wkev...@gmail.com):
> > > Hi, LXC developers,
> > > 
> > > The latest kernel release (4.4) includes initial support to cgroup v2
> > > with 2 controllers (memory and io). Also it seems that the PIDs
> > > controller works in cgroup v2, but I do not know if it is officially
> > > supported in v2.
> > > 
> > > Is there any intention to replace the existing cgroup v1 usage in LXC
> > > by cgroup v2 ? or at least to enable working with both of them ?
> > > 
> > > Regards,
> > > Kevin
> > 
> > Replace, no, support, yes.  I've added support for it to cgmanager, and have
> > used lxc with the unified hierarchy through cgmanager.  Without cgmanager
> > it will currently definately not work.  It's worth discussing how we should
> > handle it - and how init wants us to handle it.   With cgmanager I actually
> > built in the support so that you could treat it as a legacy hierarchy, and
> > upstart was happy with that since it used cgmanager.  Systemd will not be
> > happy with that, and it will be a problem.  The only exception to the "no
> > tasks in a non-leaf node" rule is for the / cgroup.  So lxc would need to
> > place init in say /lxc/c1/.leaf, and systemd would have to accept that
> > /lxc/c1 is the container's cgroup.  A few possibilities:
> > 
> > 1. maybe if we place systemd in /lxc/c1/init.scope it will be happy
> Well, here is how I thought it could go (sticking to systemd specifics here):
> - create a slice for all lxc "lxc.slice" (similar to "machine.slice" 
> of
>   systemd-nspawn backed containers)
> - "lxc.slice" contains a scope for each container (e.g. "c1.scope"
> - "c1.scope" contains an "init.scope"
> - "init.scope" only contains the PID of "/sbin/init" as seen from the
>   host (obviously)

So if we are creating container c1, are you talking about

/lxc/c1/lxc.slice/c1.scope/init.scope

or are you talking about a host-global

/lxc.slice

with container-specific

/lxc.slice/c1.scope

per container?

?

> - All other processes are put in another slice "c1-something.slice"

Which other processes?

AFAIK all other processes will be created by systemd.  The q is what will it
do.  If we put systemd in /lxc.slice/c1.scope/init.scope, will it take that
as its cgroup root and try to create and move itself into
/lxc.slice/c1.scope/init.scope ?  If so it will fail since it cannot create a
cgroup while it is in it.

So I think I've convinced myself that we need to collaborate with systemd
on this.  Perhaps we can agree with it on a default cgroup in which it should
be started to tell it "this is the leaf cgroup for your init".  So if it sees
it is in /a/b/c/.cg_leaf, then it will know that /a/b/c is its root.

> If we do not want to create scopes we are left with the option of
> forcing "init" in a separate cgroup from the rest of the containers
> processes.
> 
> Christian
> 
> 
> > 2. maybe we can teach systemd to accept being in a leaf node
> > 3. maybe we can build an exception into cgroup namespaces such that
> > a cgns root also is an exception to the no-tasks-in-non-leaf-nodes
> > rule.  But I doubt that will fly.
> > ___
> > lxc-devel mailing list
> > lxc-devel@lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-devel
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] cgroup V2 and LXC

2016-02-10 Thread Christian Brauner
On Wed, Feb 10, 2016 at 05:45:48PM +, Serge Hallyn wrote:
> Quoting Christian Brauner (christian.brau...@mailbox.org):
> > On Mon, Feb 01, 2016 at 04:56:08AM +, Serge Hallyn wrote:
> > > Quoting Kevin Wilson (wkev...@gmail.com):
> > > > Hi, LXC developers,
> > > > 
> > > > The latest kernel release (4.4) includes initial support to cgroup v2
> > > > with 2 controllers (memory and io). Also it seems that the PIDs
> > > > controller works in cgroup v2, but I do not know if it is officially
> > > > supported in v2.
> > > > 
> > > > Is there any intention to replace the existing cgroup v1 usage in LXC
> > > > by cgroup v2 ? or at least to enable working with both of them ?
> > > > 
> > > > Regards,
> > > > Kevin
> > > 
> > > Replace, no, support, yes.  I've added support for it to cgmanager, and 
> > > have
> > > used lxc with the unified hierarchy through cgmanager.  Without cgmanager
> > > it will currently definately not work.  It's worth discussing how we 
> > > should
> > > handle it - and how init wants us to handle it.   With cgmanager I 
> > > actually
> > > built in the support so that you could treat it as a legacy hierarchy, and
> > > upstart was happy with that since it used cgmanager.  Systemd will not be
> > > happy with that, and it will be a problem.  The only exception to the "no
> > > tasks in a non-leaf node" rule is for the / cgroup.  So lxc would need to
> > > place init in say /lxc/c1/.leaf, and systemd would have to accept that
> > > /lxc/c1 is the container's cgroup.  A few possibilities:
> > > 
> > > 1. maybe if we place systemd in /lxc/c1/init.scope it will be happy
> > Well, here is how I thought it could go (sticking to systemd specifics 
> > here):
> > - create a slice for all lxc "lxc.slice" (similar to 
> > "machine.slice" of
> >   systemd-nspawn backed containers)
> > - "lxc.slice" contains a scope for each container (e.g. "c1.scope"
> > - "c1.scope" contains an "init.scope"
> > - "init.scope" only contains the PID of "/sbin/init" as seen from 
> > the
> >   host (obviously)
> 
> So if we are creating container c1, are you talking about
> 
> /lxc/c1/lxc.slice/c1.scope/init.scope
> 
> or are you talking about a host-global
> 
> /lxc.slice
Yes, you have lxc.slice then you have all your machines under this. This is what
systemd-nspawn does if I'm not mistaken.
> 
> with container-specific
> 
> /lxc.slice/c1.scope
> 
> per container?
> 
> ?
Yes.
> 
> > - All other processes are put in another slice "c1-something.slice"
> 
> Which other processes?
Well, all processes, systemd starts are either put in system.slice or
user.slice. All other things we start in the container (let it be e.g. vim) is
put in a session.slice (e.g. session-0.slice, session-1000.slice).
> 
> AFAIK all other processes will be created by systemd.  The q is what will it
> do.  If we put systemd in /lxc.slice/c1.scope/init.scope, will it take that
> as its cgroup root and try to create and move itself into
> /lxc.slice/c1.scope/init.scope ?  If so it will fail since it cannot create a
> cgroup while it is in it.
I don't think so but I need to test that again. Time to boot unified.

> 
> So I think I've convinced myself that we need to collaborate with systemd
> on this.  Perhaps we can agree with it on a default cgroup in which it should
> be started to tell it "this is the leaf cgroup for your init".  So if it sees
> it is in /a/b/c/.cg_leaf, then it will know that /a/b/c is its root.
I thought the same that's why I started to read some of the code.
fwiw, systemd-nspawn already works with the unified cgroup hierarchy and I think
nesting works as well. But I'm not completely sure how nspawn handles nesting.

> 
> > If we do not want to create scopes we are left with the option of
> > forcing "init" in a separate cgroup from the rest of the containers
> > processes.
> > 
> > Christian
> > 
> > 
> > > 2. maybe we can teach systemd to accept being in a leaf node
> > > 3. maybe we can build an exception into cgroup namespaces such that
> > > a cgns root also is an exception to the no-tasks-in-non-leaf-nodes
> > > rule.  But I doubt that will fly.
> > > ___
> > > lxc-devel mailing list
> > > lxc-devel@lists.linuxcontainers.org
> > > http://lists.linuxcontainers.org/listinfo/lxc-devel
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] cgroup V2 and LXC

2016-02-09 Thread Christian Brauner
On Mon, Feb 01, 2016 at 04:56:08AM +, Serge Hallyn wrote:
> Quoting Kevin Wilson (wkev...@gmail.com):
> > Hi, LXC developers,
> > 
> > The latest kernel release (4.4) includes initial support to cgroup v2
> > with 2 controllers (memory and io). Also it seems that the PIDs
> > controller works in cgroup v2, but I do not know if it is officially
> > supported in v2.
> > 
> > Is there any intention to replace the existing cgroup v1 usage in LXC
> > by cgroup v2 ? or at least to enable working with both of them ?
> > 
> > Regards,
> > Kevin
> 
> Replace, no, support, yes.  I've added support for it to cgmanager, and have
> used lxc with the unified hierarchy through cgmanager.  Without cgmanager
> it will currently definately not work.  It's worth discussing how we should
> handle it - and how init wants us to handle it.   With cgmanager I actually
> built in the support so that you could treat it as a legacy hierarchy, and
> upstart was happy with that since it used cgmanager.  Systemd will not be
> happy with that, and it will be a problem.  The only exception to the "no
> tasks in a non-leaf node" rule is for the / cgroup.  So lxc would need to
> place init in say /lxc/c1/.leaf, and systemd would have to accept that
> /lxc/c1 is the container's cgroup.  A few possibilities:
> 
> 1. maybe if we place systemd in /lxc/c1/init.scope it will be happy
Well, here is how I thought it could go (sticking to systemd specifics here):
- create a slice for all lxc "lxc.slice" (similar to "machine.slice" of
  systemd-nspawn backed containers)
- "lxc.slice" contains a scope for each container (e.g. "c1.scope"
- "c1.scope" contains an "init.scope"
- "init.scope" only contains the PID of "/sbin/init" as seen from the
  host (obviously)
- All other processes are put in another slice "c1-something.slice"

If we do not want to create scopes we are left with the option of
forcing "init" in a separate cgroup from the rest of the containers
processes.

Christian


> 2. maybe we can teach systemd to accept being in a leaf node
> 3. maybe we can build an exception into cgroup namespaces such that
> a cgns root also is an exception to the no-tasks-in-non-leaf-nodes
> rule.  But I doubt that will fly.
> ___
> lxc-devel mailing list
> lxc-devel@lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] cgroup V2 and LXC

2016-01-31 Thread Serge Hallyn
Quoting Kevin Wilson (wkev...@gmail.com):
> Hi, LXC developers,
> 
> The latest kernel release (4.4) includes initial support to cgroup v2
> with 2 controllers (memory and io). Also it seems that the PIDs
> controller works in cgroup v2, but I do not know if it is officially
> supported in v2.
> 
> Is there any intention to replace the existing cgroup v1 usage in LXC
> by cgroup v2 ? or at least to enable working with both of them ?
> 
> Regards,
> Kevin

Replace, no, support, yes.  I've added support for it to cgmanager, and have
used lxc with the unified hierarchy through cgmanager.  Without cgmanager
it will currently definately not work.  It's worth discussing how we should
handle it - and how init wants us to handle it.   With cgmanager I actually
built in the support so that you could treat it as a legacy hierarchy, and
upstart was happy with that since it used cgmanager.  Systemd will not be
happy with that, and it will be a problem.  The only exception to the "no
tasks in a non-leaf node" rule is for the / cgroup.  So lxc would need to
place init in say /lxc/c1/.leaf, and systemd would have to accept that
/lxc/c1 is the container's cgroup.  A few possibilities:

1. maybe if we place systemd in /lxc/c1/init.scope it will be happy
2. maybe we can teach systemd to accept being in a leaf node
3. maybe we can build an exception into cgroup namespaces such that
a cgns root also is an exception to the no-tasks-in-non-leaf-nodes
rule.  But I doubt that will fly.
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel