On Mon, 2017-02-27 at 19:18 -0500, Paul Moore wrote: > On Mon, Feb 27, 2017 at 4:23 PM, Stephen Smalley <[email protected]> > wrote: > > > > On Mon, 2017-02-27 at 12:48 -0800, Nick Kralevich wrote: > > > > > > On Mon, Feb 27, 2017 at 11:53 AM, Stephen Smalley <[email protected]. > > > gov> > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > I can reproduce it on angler (with a back-port of just that > > > > > patch), > > > > > although I am unclear on the cause. The patch is only > > > > > supposed > > > > > to > > > > > enable explicit setting of security labels by userspace on > > > > > cgroup > > > > > files, so it isn't supposed to cause any breakage under > > > > > existing > > > > > policy. Prior to the patch, the kernel would always just > > > > > return > > > > > -1 > > > > > with errno EOPNOTSUPP upon attempts to set security labels on > > > > > cgroup > > > > > files; with the patch, the kernel may instead return -1 with > > > > > errno > > > > > EACCES if not allowed. So I suppose if userspace was > > > > > explicitly > > > > > testing for EOPNOTSUPP and not failing hard in that case, it > > > > > might > > > > > cause breakage. Not sure why existing userspace would be > > > > > trying > > > > > to > > > > > relabel cgroup files, unless it is just a recursive > > > > > restorecon > > > > > that > > > > > happens to traverse into a cgroup mount (and in that case, > > > > > not > > > > > sure > > > > > why > > > > > it would be fatal). Other possible interaction would be use > > > > > of > > > > > setfscreatecon() prior to creating a file in cgroup. > > > > > > > > Oh, I see - it is the latter. > > > > > > > > For example, init.rc does mkdir /dev/cpuctl/bg_non_interactive, > > > > which > > > > internally looks up the context for that directory from > > > > file_contexts > > > > and does a setfscreatecon() followed by a mkdir(). Previously, > > > > that > > > > was ignored because cgroup did not support anything other than > > > > the > > > > policy-defined label. But now it will try to use that label, > > > > which > > > > in > > > > turn will trigger a denial in enforcing mode and the create > > > > will > > > > fail. > > > > > > > > So this is an incompatible change and needs to be reverted. > > > > We'll need to wrap it up with a policy capability or something > > > > to > > > > allow > > > > it to be enabled only if the policy correctly supports > > > > it. Even > > > > better, we should instead just allow the policy to specify > > > > which > > > > filesystems should support this behavior (already on the issues > > > > list). > > > > > > > > > > If Android is the only system affected by this bug, I would > > > prefer to > > > just fix Android to allow for this patch, rather than having > > > additional kernel complexity. > > > > Well, it does break userspace (even if it happens to only affect > > Android, which isn't clear, e.g. possibly a distribution would > > likewise > > suffer breakage under a tighter policy), and we already have a > > long- > > standing open issue to replace the current set of whitelisted > > filesystem types with something configuration-driven. So I'm ok > > with > > reverting it and requiring it to be done in a more general > > way. The > > latter is something we want regardless. > > This went up to Linus during the current merge window via the > stable-4.11 branch and I know the container guys really want this so > I'd prefer to fix this up in 4.11 with a policy capability if > possible > (and I believe it should be). I agree with Stephen that we need a > better long term solution, but I think a policy capability should > work > in the short term. > > Who wants to send me a patch? ;)
So, there are a couple of caveats with doing that: 1) It still requires the container folks to update their kernel, libsepol, and policy in order to make use of the new policy capability. 2) The determination of whether a given mount should be assigned this flag is made at mount time, so you can't simply reload policy with a policy that defines this capability and have it automatically applied to existing cgroup mounts. You'd have to unmount and re-mount them (more likely reboot). Not saying you can't do that, just understand what is required.

