"Serge E. Hallyn" <se...@hallyn.com> writes: > Quoting Eric W. Biederman (ebied...@xmission.com): >> Adam Richter <adamricht...@gmail.com> writes: >> >> > On Linux 4.8-rc1 through 4-8-rc6 (latest rc), lxc fails start to >> > Ubuntu 16.04 and Centos 7 containers [1], unless I first run >> > "cgmanager -m name=systemd &" on the host, which, unlike the >> > containers, was not running systemd or cgmanager. >> >> Yes, that appears correct. Given the current flat namespace of >> hierarchies you fundamentally must coordinate with the host if you want >> to use a new hierarchy. So running cgmanager on the host seems like >> the minimum way to do that. >> >> If we truly need something more (which does not appear to be the case >> here) the names of hierarchies need to be moved into a namespace. >> >> > Git bisect revealed that this behavior began with a commit entitled >> > "cgroupns: Only allow creation of hierarchies in the initial cgroup >> > namespace" [2], which appears to be an attempt to protect against a >> > possible denial of service attack. Reversing the commit also restores >> > successful commit the need to run that cgmanager process. [Eric and >> > Tejun, I have bcc'ed you so you can be aware of this discussion >> > thread, as you apparently respectively wrote and approved the commit.] >> >> As far as I can tell you were getting lucky and not having problems >> before. >> >> > Running that cgmanager invocation is pretty simple, and seems to me to >> > be well worth closing a denial of service vulnerability, much as I >> > dislike adding something systemd-specific to a non-systemd environment >> > and adding a new dependency (lxc requires cgmanager on the host to >> > run, I guess, any container that runs systemd). However, I am posting >> > this message because I don't fully understand the problem, and, most >> > importantly, I am wondering if I have stumbled on an unintended >> > consequence of this commit that might have other indicate other >> > potential breakage. >> >> I am surprised that your case worked but I don't think it amounts to an >> unintended consequence. >> >> > If this new lxc behavior is completely acceptable, then I apologize >> > for consuming people's time with it and hope that this message will >> > allow others experiencing the same problem find an answer for it when >> > they search the web. >> >> I will let the lxc-developers judge. >> >> I don't think you hit a case that was expected to work. Furthermore > > fwiw indeed this was never expected to work. >
As just creating the hiearchy before starting the container fixes this, I agree this does appear to be just a documentation issue. >> either your containers were overprivileged or they would not have been >> able to create subdirectories in the cgroup hierarchy. So I expect this >> change transformed a subtle breakage (aka one you had not noticed yet) >> into an explicit breakage. >> >> I am not subscribed to lxc-users so I don't know if anyone else has >> replied to your post. Cc's would have been better than Bcc's for >> getting feedback in a situation like this. >> >> Eric >> >> >> > Adam Richter >> > >> > >> > [1] Here is an example of failing to start one of these containers. >> > $ sudo lxc-start --name ubuntu16.04_amd64 --foreground >> > Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted >> > [!!!!!!] Failed to mount API filesystems, freezing. >> > Freezing execution. >> > >> > >> > [2] Here is the commit diff that triggers the new mishbehavior. >> > commit 726a4994b05ff5b6f83d64b5b43c3251217366ce >> > Author: Eric W. Biederman <ebied...@xmission.com> >> > Date: Fri Jul 15 06:36:44 2016 -0500 >> > >> > cgroupns: Only allow creation of hierarchies in the initial cgroup >> > namespace >> > >> > Unprivileged users can't use hierarchies if they create them as they >> > do not >> > have privilieges to the root directory. >> > >> > Which means the only thing a hiearchy created by an unprivileged user >> > is good for is expanding the number of cgroup links in every css_set, >> > which is a DOS attack. >> > >> > We could allow hierarchies to be created in namespaces in the initial >> > user namespace. Unfortunately there is only a single namespace for >> > the names of heirarchies, so that is likely to create more confusion >> > than not. >> > >> > So do the simple thing and restrict hiearchy creation to the initial >> > cgroup namespace. >> > >> > Cc: sta...@vger.kernel.org >> > Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") >> > Signed-off-by: "Eric W. Biederman" <ebied...@xmission.com> >> > Signed-off-by: Tejun Heo <t...@kernel.org> >> > >> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c >> > index e75efa8..e0be49f 100644 >> > --- a/kernel/cgroup.c >> > +++ b/kernel/cgroup.c >> > @@ -2215,12 +2215,8 @@ static struct dentry *cgroup_mount(struct >> > file_system_type *fs_type, >> > goto out_unlock; >> > } >> > >> > - /* >> > - * We know this subsystem has not yet been bound. Users in a >> > non-init >> > - * user namespace may only mount hierarchies with no bound >> > subsystems, >> > - * i.e. 'none,name=user1' >> > - */ >> > - if (!opts.none && !capable(CAP_SYS_ADMIN)) { >> > + /* Hierarchies may only be created in the initial cgroup >> > namespace. */ >> > + if (ns != &init_cgroup_ns) { >> > ret = -EPERM; >> > goto out_unlock; >> > } _______________________________________________ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel