Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On November 17, 2014 1:46:59 PM EST, Andy Lutomirski wrote: >On Mon, Nov 17, 2014 at 10:31 AM, Andy Lutomirski >wrote: >> On Mon, Nov 17, 2014 at 10:06 AM, Casey Schaufler >> wrote: >>> On 11/15/2014 1:01 AM, Josh Triplett wrote: Currently, unprivileged processes (without CAP_SETGID) cannot call setgroups at all. In particular, processes with a set of >supplementary groups cannot further drop permissions without obtaining elevated permissions first. >>> >>> Has anyone put any thought into how this will interact with >>> POSIX ACLs? I don't see that anywhere in the discussion. >> >> That means that user namespaces are a problem, too, and we need to >fix >> it. Or we should add some control to turn unprivileged user >namespace >> creation on and off and document that turning it on defeats POSIX >ACLs >> with a group entry that is more restrictive than the other entry. >> > >This is a significant enough issue that I posted it to oss-security: > >http://www.openwall.com/lists/oss-security/2014/11/17/19 > >It's not at all obvious to me how to fix it. We could disallow userns >creation of any supplementary groups don't match fsuid, or we could >keep negative-only groups around in the userns. > >It may be worth adding a sysctl to change the behavior, too. IMO it's >absurd to use groups to deny permissions that are otherwise available. There is an obvious user namespace fix. Don't allow dropping supplemental groups that are not mapped. That will require a little bit of fancy footwork if you want to play with supplemental groups in your unprivileged user namespace. I would like to get a grip on what hoops would be required before we add the additional restriction. Possibly something as simple as calling sg. I also want to look at what Tizen and any other concrete pieces of code I can find using this negative permission pattern are actually doing. Bugs definitely exist, but I have this erie feeling that the bugs may be in instances of userspace using this negative group permission pattern. I think we may have a hideous case of one setuid binary defeating a privilege check of another piece of code. This issue looks like it is worth a full scale investigation. Sigh. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On November 17, 2014 1:07:30 PM EST, Andy Lutomirski wrote: >On Nov 17, 2014 3:37 AM, "One Thousand Gnomes" > wrote: >> >> > optional), I can do that too. The security model of "having a >group >> > gives you less privilege than not having it" seems crazy, but >> > nonetheless I can see a couple of easy ways that we can avoid >breaking >> >> It's an old pattern of use that makes complete sense in a traditional >> Unix permission world because it's the only way to do "exclude >{list}" >> nicely. Our default IMHO shouldn't break this. >> >> > that pattern, no_new_privs being one of them. I'd like to make >sure >> > that nobody sees any other real-world corner case that unprivileged >> > setgroups would break. >> >> Barring the usual risk of people doing improper error checking I >don't >> see one immediately. >> >> For containers I think it actually makes sense that the sysctl can be >> applied per container anyway. > >We'll probably need per container sysctls some day. We already have a mess of per network namespace sysctls, as well as few for other namespaces. We have the infrastructure it is just a matter of using it for whatever purpose we need. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On November 17, 2014 1:07:30 PM EST, Andy Lutomirski l...@amacapital.net wrote: On Nov 17, 2014 3:37 AM, One Thousand Gnomes gno...@lxorguk.ukuu.org.uk wrote: optional), I can do that too. The security model of having a group gives you less privilege than not having it seems crazy, but nonetheless I can see a couple of easy ways that we can avoid breaking It's an old pattern of use that makes complete sense in a traditional Unix permission world because it's the only way to do exclude {list} nicely. Our default IMHO shouldn't break this. that pattern, no_new_privs being one of them. I'd like to make sure that nobody sees any other real-world corner case that unprivileged setgroups would break. Barring the usual risk of people doing improper error checking I don't see one immediately. For containers I think it actually makes sense that the sysctl can be applied per container anyway. We'll probably need per container sysctls some day. We already have a mess of per network namespace sysctls, as well as few for other namespaces. We have the infrastructure it is just a matter of using it for whatever purpose we need. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On November 17, 2014 1:46:59 PM EST, Andy Lutomirski l...@amacapital.net wrote: On Mon, Nov 17, 2014 at 10:31 AM, Andy Lutomirski l...@amacapital.net wrote: On Mon, Nov 17, 2014 at 10:06 AM, Casey Schaufler ca...@schaufler-ca.com wrote: On 11/15/2014 1:01 AM, Josh Triplett wrote: Currently, unprivileged processes (without CAP_SETGID) cannot call setgroups at all. In particular, processes with a set of supplementary groups cannot further drop permissions without obtaining elevated permissions first. Has anyone put any thought into how this will interact with POSIX ACLs? I don't see that anywhere in the discussion. That means that user namespaces are a problem, too, and we need to fix it. Or we should add some control to turn unprivileged user namespace creation on and off and document that turning it on defeats POSIX ACLs with a group entry that is more restrictive than the other entry. This is a significant enough issue that I posted it to oss-security: http://www.openwall.com/lists/oss-security/2014/11/17/19 It's not at all obvious to me how to fix it. We could disallow userns creation of any supplementary groups don't match fsuid, or we could keep negative-only groups around in the userns. It may be worth adding a sysctl to change the behavior, too. IMO it's absurd to use groups to deny permissions that are otherwise available. There is an obvious user namespace fix. Don't allow dropping supplemental groups that are not mapped. That will require a little bit of fancy footwork if you want to play with supplemental groups in your unprivileged user namespace. I would like to get a grip on what hoops would be required before we add the additional restriction. Possibly something as simple as calling sg. I also want to look at what Tizen and any other concrete pieces of code I can find using this negative permission pattern are actually doing. Bugs definitely exist, but I have this erie feeling that the bugs may be in instances of userspace using this negative group permission pattern. I think we may have a hideous case of one setuid binary defeating a privilege check of another piece of code. This issue looks like it is worth a full scale investigation. Sigh. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] fs: add O_BENEATH flag to openat(2)
On November 3, 2014 7:42:58 AM PST, Andy Lutomirski wrote: >On Mon, Nov 3, 2014 at 7:20 AM, Al Viro >wrote: >> On Mon, Nov 03, 2014 at 11:48:23AM +, David Drysdale wrote: >>> Add a new O_BENEATH flag for openat(2) which restricts the >>> provided path, rejecting (with -EACCES) paths that are not beneath >>> the provided dfd. In particular, reject: >>> - paths that contain .. components >>> - paths that begin with / >>> - symlinks that have paths as above. >> >> Yecch... The degree of usefulness aside (and I'm not convinced that >it >> is non-zero), > >This is extremely useful in conjunction with seccomp. > >> WTF pass one bit out of nameidata->flags in a separate argument? >> Through the mutual recursion, no less... And then you are not even >attempting >> to detect symlinks that are not followed by interpretation of _any_ >pathname. > >How many symlinks like that are there? Is there anything except >nd_jump_link users? All of those are in /proc. Arguably O_BENEATH >should prevent traversal of all of those links. Not commenting on the sanity of this one way or another, and I haven't read the patch. There is an absolutely trivial implementation of this. After the path is resolved, walk backwards along d_parent and the mount tree, and see if you come to the file or directory dfd refers to. That can handle magic proc symlinks, and does not need to disallow .. or / explicitly so it should be much simpler code. My gut says that if Al says blech when looking at your code it is too complex to give you a security guarantee. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] fs: add O_BENEATH flag to openat(2)
On November 3, 2014 7:42:58 AM PST, Andy Lutomirski l...@amacapital.net wrote: On Mon, Nov 3, 2014 at 7:20 AM, Al Viro v...@zeniv.linux.org.uk wrote: On Mon, Nov 03, 2014 at 11:48:23AM +, David Drysdale wrote: Add a new O_BENEATH flag for openat(2) which restricts the provided path, rejecting (with -EACCES) paths that are not beneath the provided dfd. In particular, reject: - paths that contain .. components - paths that begin with / - symlinks that have paths as above. Yecch... The degree of usefulness aside (and I'm not convinced that it is non-zero), This is extremely useful in conjunction with seccomp. WTF pass one bit out of nameidata-flags in a separate argument? Through the mutual recursion, no less... And then you are not even attempting to detect symlinks that are not followed by interpretation of _any_ pathname. How many symlinks like that are there? Is there anything except nd_jump_link users? All of those are in /proc. Arguably O_BENEATH should prevent traversal of all of those links. Not commenting on the sanity of this one way or another, and I haven't read the patch. There is an absolutely trivial implementation of this. After the path is resolved, walk backwards along d_parent and the mount tree, and see if you come to the file or directory dfd refers to. That can handle magic proc symlinks, and does not need to disallow .. or / explicitly so it should be much simpler code. My gut says that if Al says blech when looking at your code it is too complex to give you a security guarantee. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] dealing with proc_ns_follow_link() and "namespace" dentries
On November 1, 2014 8:06:20 AM PDT, Al Viro wrote: >On Sat, Nov 01, 2014 at 08:38:04AM +, Al Viro wrote: >> OK, interim branch (_completely_ untested, and there's quite a bit of >> work remaining) is in vfs.git#nsfs. > >... except that what got pushed was completely buggered - the last >changes >not committed *and* include/linux/ns_common.h not git-add'ed at the >very >beginning. Oh, well - it wasn't hard to reconstruct its history... > >Anyway, that much got fixed and pushed; sorry about the noise - >shouldn't >have posted before grabbing some sleep... No worries. I am effectively without internet access right from now until about the 3rd week in November as I move so I really can't look anyway. >From your description I am concerned about using the letter combination nsfs. > I once used nsfd, and that was so close to nfsd that Linus got confused, and >hilarity ensued. nsfs isn't quite as bad but the abbreviation still seems >close enough to nfs that confusion could result. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] dealing with proc_ns_follow_link() and namespace dentries
On November 1, 2014 8:06:20 AM PDT, Al Viro v...@zeniv.linux.org.uk wrote: On Sat, Nov 01, 2014 at 08:38:04AM +, Al Viro wrote: OK, interim branch (_completely_ untested, and there's quite a bit of work remaining) is in vfs.git#nsfs. ... except that what got pushed was completely buggered - the last changes not committed *and* include/linux/ns_common.h not git-add'ed at the very beginning. Oh, well - it wasn't hard to reconstruct its history... Anyway, that much got fixed and pushed; sorry about the noise - shouldn't have posted before grabbing some sleep... No worries. I am effectively without internet access right from now until about the 3rd week in November as I move so I really can't look anyway. From your description I am concerned about using the letter combination nsfs. I once used nsfd, and that was so close to nfsd that Linus got confused, and hilarity ensued. nsfs isn't quite as bad but the abbreviation still seems close enough to nfs that confusion could result. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv1 7/8] cgroup: cgroup namespace setns support
On October 19, 2014 1:26:29 PM CDT, Andy Lutomirski wrote: >On Sat, Oct 18, 2014 at 10:23 PM, Eric W. Biederman > wrote: >> "Serge E. Hallyn" writes: >> >>> Quoting Aditya Kali (adityak...@google.com): On Thu, Oct 16, 2014 at 2:12 PM, Serge E. Hallyn >wrote: > Quoting Aditya Kali (adityak...@google.com): >> setns on a cgroup namespace is allowed only if >> * task has CAP_SYS_ADMIN in its current user-namespace and >> over the user-namespace associated with target cgroupns. >> * task's current cgroup is descendent of the target >cgroupns-root >> cgroup. > > What is the point of this? > > If I'm a user logged into > /lxc/c1/user.slice/user-1000.slice/session-c12.scope and I start > a container which is in > /lxc/c1/user.slice/user-1000.slice/session-c12.scope/x1 > then I will want to be able to enter the container's cgroup. > The container's cgroup root is under my own (satisfying the > below condition0 but my cgroup is not a descendent of the > container's cgroup. > This condition is there because we don't want to do implicit cgroup changes when a process attaches to another cgroupns. cgroupns tries >to preserve the invariant that at any point, your current cgroup is always under the cgroupns-root of your cgroup namespace. But in >your example, if we allow a process in "session-c12.scope" container to attach to cgroupns root'ed at "session-c12.scope/x1" container (without implicitly moving its cgroup), then this invariant won't hold. >>> >>> Oh, I see. Guess that should be workable. Thanks. >> >> Which has me looking at what the rules are for moving through >> the cgroup hierarchy. >> >> As long as we have write access to cgroup.procs and are allowed >> to open the file for write, we can move any of our own tasks >> into the cgroup. So the cgroup namespace rules don't seem >> to be a problem. >> >> Andy can you please take a look at the permission checks in >> __cgroup_procs_write. > >The actual requirements for calling that function haven't changed, >right? IOW, what does this have to do with cgroupns? Excluding user namespaces the requirements have not changed. The immediate correlation is that to enter a cgroupns you must first put your process in one of it's cgroups. So I was examining what it would take to enter the cgroup of cgroupns. > Is the idea >that you want a privileged user wrt a cgroupns's userns to be able to >use this? If so: > >Yes, that current_cred() thing is bogus. (Actually, this is probably >exploitable right now if any cgroup.procs inode anywhere on the system >lets non-root write.) (Can we have some kernel debugging option that >makes any use of current_cred() in write(2) warn?) > >We really need a weaker version of may_ptrace for this kind of stuff. >Maybe the existing may_ptrace stuff is okay, actually. But this is >completely missing group checks, cap checks, capabilities wrt the >userns, etc. > >Also, I think that, if this version of the patchset allows non-init >userns to unshare cgroupns, then the issue of what permission is >needed to lock the cgroup hierarchy like that needs to be addressed, >because unshare(CLONE_NEWUSER|CLONE_NEWCGROUP) will effectively pin >the calling task with no permission required. Bolting on a fix later >will be a mess. I imagine the pinning would be like the userns. Ah but there is a potentially serious issue with the pinning. With pinning we can make it impossible for root to move us to a different cgroup. I am not certain how serious that is but it bears thinking about. If we don't implement pinning we should be able to implent everything with just filesystem mount options, and no new namespace required. Sigh. I am too tired tonight to see the end game in this. Eric >> As I read the code I see 3 security gaffaws in the permssion check. >> - Using current->cred instead of file->f_cred. >> - Not checking tcred->euid. >> - Checking GLOBAL_ROOT_UID instead of having a capable call. >> >> The file permission on cgroup.procs seem just sufficient to keep >> to keep those bugs from being easily exploitable. >> >> Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv1 7/8] cgroup: cgroup namespace setns support
On October 19, 2014 1:26:29 PM CDT, Andy Lutomirski l...@amacapital.net wrote: On Sat, Oct 18, 2014 at 10:23 PM, Eric W. Biederman ebied...@xmission.com wrote: Serge E. Hallyn se...@hallyn.com writes: Quoting Aditya Kali (adityak...@google.com): On Thu, Oct 16, 2014 at 2:12 PM, Serge E. Hallyn se...@hallyn.com wrote: Quoting Aditya Kali (adityak...@google.com): setns on a cgroup namespace is allowed only if * task has CAP_SYS_ADMIN in its current user-namespace and over the user-namespace associated with target cgroupns. * task's current cgroup is descendent of the target cgroupns-root cgroup. What is the point of this? If I'm a user logged into /lxc/c1/user.slice/user-1000.slice/session-c12.scope and I start a container which is in /lxc/c1/user.slice/user-1000.slice/session-c12.scope/x1 then I will want to be able to enter the container's cgroup. The container's cgroup root is under my own (satisfying the below condition0 but my cgroup is not a descendent of the container's cgroup. This condition is there because we don't want to do implicit cgroup changes when a process attaches to another cgroupns. cgroupns tries to preserve the invariant that at any point, your current cgroup is always under the cgroupns-root of your cgroup namespace. But in your example, if we allow a process in session-c12.scope container to attach to cgroupns root'ed at session-c12.scope/x1 container (without implicitly moving its cgroup), then this invariant won't hold. Oh, I see. Guess that should be workable. Thanks. Which has me looking at what the rules are for moving through the cgroup hierarchy. As long as we have write access to cgroup.procs and are allowed to open the file for write, we can move any of our own tasks into the cgroup. So the cgroup namespace rules don't seem to be a problem. Andy can you please take a look at the permission checks in __cgroup_procs_write. The actual requirements for calling that function haven't changed, right? IOW, what does this have to do with cgroupns? Excluding user namespaces the requirements have not changed. The immediate correlation is that to enter a cgroupns you must first put your process in one of it's cgroups. So I was examining what it would take to enter the cgroup of cgroupns. Is the idea that you want a privileged user wrt a cgroupns's userns to be able to use this? If so: Yes, that current_cred() thing is bogus. (Actually, this is probably exploitable right now if any cgroup.procs inode anywhere on the system lets non-root write.) (Can we have some kernel debugging option that makes any use of current_cred() in write(2) warn?) We really need a weaker version of may_ptrace for this kind of stuff. Maybe the existing may_ptrace stuff is okay, actually. But this is completely missing group checks, cap checks, capabilities wrt the userns, etc. Also, I think that, if this version of the patchset allows non-init userns to unshare cgroupns, then the issue of what permission is needed to lock the cgroup hierarchy like that needs to be addressed, because unshare(CLONE_NEWUSER|CLONE_NEWCGROUP) will effectively pin the calling task with no permission required. Bolting on a fix later will be a mess. I imagine the pinning would be like the userns. Ah but there is a potentially serious issue with the pinning. With pinning we can make it impossible for root to move us to a different cgroup. I am not certain how serious that is but it bears thinking about. If we don't implement pinning we should be able to implent everything with just filesystem mount options, and no new namespace required. Sigh. I am too tired tonight to see the end game in this. Eric As I read the code I see 3 security gaffaws in the permssion check. - Using current-cred instead of file-f_cred. - Not checking tcred-euid. - Checking GLOBAL_ROOT_UID instead of having a capable call. The file permission on cgroup.procs seem just sufficient to keep to keep those bugs from being easily exploitable. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/