Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-17 Thread Eric W.Biederman


On November 17, 2014 1:46:59 PM EST, Andy Lutomirski  
wrote:
>On Mon, Nov 17, 2014 at 10:31 AM, Andy Lutomirski 
>wrote:
>> On Mon, Nov 17, 2014 at 10:06 AM, Casey Schaufler
>>  wrote:
>>> On 11/15/2014 1:01 AM, Josh Triplett wrote:
 Currently, unprivileged processes (without CAP_SETGID) cannot call
 setgroups at all.  In particular, processes with a set of
>supplementary
 groups cannot further drop permissions without obtaining elevated
 permissions first.
>>>
>>> Has anyone put any thought into how this will interact with
>>> POSIX ACLs? I don't see that anywhere in the discussion.
>>
>> That means that user namespaces are a problem, too, and we need to
>fix
>> it.  Or we should add some control to turn unprivileged user
>namespace
>> creation on and off and document that turning it on defeats POSIX
>ACLs
>> with a group entry that is more restrictive than the other entry.
>>
>
>This is a significant enough issue that I posted it to oss-security:
>
>http://www.openwall.com/lists/oss-security/2014/11/17/19
>
>It's not at all obvious to me how to fix it.  We could disallow userns
>creation of any supplementary groups don't match fsuid, or we could
>keep negative-only groups around in the userns.
>
>It may be worth adding a sysctl to change the behavior, too.  IMO it's
>absurd to use groups to deny permissions that are otherwise available.

There is an obvious user namespace fix.  Don't allow dropping supplemental 
groups that are not mapped.

That will require a little bit of fancy footwork if you want to play with 
supplemental groups in your unprivileged user namespace.   I would like to get 
a grip on what hoops would be required before we add the additional 
restriction.  Possibly something as simple as calling sg.

I also want to look at what Tizen and any other concrete pieces of code I can 
find using this negative permission pattern are actually doing.   Bugs 
definitely exist, but I have this erie feeling that the bugs may be in 
instances of userspace using this negative group permission pattern.  I think 
we may have a hideous case of one setuid binary defeating a privilege check of 
another piece of code.

This issue looks like it is worth a full scale investigation.  Sigh.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-17 Thread Eric W.Biederman


On November 17, 2014 1:07:30 PM EST, Andy Lutomirski  
wrote:
>On Nov 17, 2014 3:37 AM, "One Thousand Gnomes"
> wrote:
>>
>> > optional), I can do that too.  The security model of "having a
>group
>> > gives you less privilege than not having it" seems crazy, but
>> > nonetheless I can see a couple of easy ways that we can avoid
>breaking
>>
>> It's an old pattern of use that makes complete sense in a traditional
>> Unix permission world because it's the only way to do "exclude
>{list}"
>> nicely. Our default IMHO shouldn't break this.
>>
>> > that pattern, no_new_privs being one of them.  I'd like to make
>sure
>> > that nobody sees any other real-world corner case that unprivileged
>> > setgroups would break.
>>
>> Barring the usual risk of people doing improper error checking I
>don't
>> see one immediately.
>>
>> For containers I think it actually makes sense that the sysctl can be
>> applied per container anyway.
>
>We'll probably need per container sysctls some day.

We already have a mess of per network namespace sysctls,
as well as few for other namespaces.

We have the infrastructure it is just a matter of using it for whatever purpose 
we need.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-17 Thread Eric W.Biederman


On November 17, 2014 1:07:30 PM EST, Andy Lutomirski l...@amacapital.net 
wrote:
On Nov 17, 2014 3:37 AM, One Thousand Gnomes
gno...@lxorguk.ukuu.org.uk wrote:

  optional), I can do that too.  The security model of having a
group
  gives you less privilege than not having it seems crazy, but
  nonetheless I can see a couple of easy ways that we can avoid
breaking

 It's an old pattern of use that makes complete sense in a traditional
 Unix permission world because it's the only way to do exclude
{list}
 nicely. Our default IMHO shouldn't break this.

  that pattern, no_new_privs being one of them.  I'd like to make
sure
  that nobody sees any other real-world corner case that unprivileged
  setgroups would break.

 Barring the usual risk of people doing improper error checking I
don't
 see one immediately.

 For containers I think it actually makes sense that the sysctl can be
 applied per container anyway.

We'll probably need per container sysctls some day.

We already have a mess of per network namespace sysctls,
as well as few for other namespaces.

We have the infrastructure it is just a matter of using it for whatever purpose 
we need.

Eric

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-17 Thread Eric W.Biederman


On November 17, 2014 1:46:59 PM EST, Andy Lutomirski l...@amacapital.net 
wrote:
On Mon, Nov 17, 2014 at 10:31 AM, Andy Lutomirski l...@amacapital.net
wrote:
 On Mon, Nov 17, 2014 at 10:06 AM, Casey Schaufler
 ca...@schaufler-ca.com wrote:
 On 11/15/2014 1:01 AM, Josh Triplett wrote:
 Currently, unprivileged processes (without CAP_SETGID) cannot call
 setgroups at all.  In particular, processes with a set of
supplementary
 groups cannot further drop permissions without obtaining elevated
 permissions first.

 Has anyone put any thought into how this will interact with
 POSIX ACLs? I don't see that anywhere in the discussion.

 That means that user namespaces are a problem, too, and we need to
fix
 it.  Or we should add some control to turn unprivileged user
namespace
 creation on and off and document that turning it on defeats POSIX
ACLs
 with a group entry that is more restrictive than the other entry.


This is a significant enough issue that I posted it to oss-security:

http://www.openwall.com/lists/oss-security/2014/11/17/19

It's not at all obvious to me how to fix it.  We could disallow userns
creation of any supplementary groups don't match fsuid, or we could
keep negative-only groups around in the userns.

It may be worth adding a sysctl to change the behavior, too.  IMO it's
absurd to use groups to deny permissions that are otherwise available.

There is an obvious user namespace fix.  Don't allow dropping supplemental 
groups that are not mapped.

That will require a little bit of fancy footwork if you want to play with 
supplemental groups in your unprivileged user namespace.   I would like to get 
a grip on what hoops would be required before we add the additional 
restriction.  Possibly something as simple as calling sg.

I also want to look at what Tizen and any other concrete pieces of code I can 
find using this negative permission pattern are actually doing.   Bugs 
definitely exist, but I have this erie feeling that the bugs may be in 
instances of userspace using this negative group permission pattern.  I think 
we may have a hideous case of one setuid binary defeating a privilege check of 
another piece of code.

This issue looks like it is worth a full scale investigation.  Sigh.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] fs: add O_BENEATH flag to openat(2)

2014-11-03 Thread Eric W.Biederman


On November 3, 2014 7:42:58 AM PST, Andy Lutomirski  wrote:
>On Mon, Nov 3, 2014 at 7:20 AM, Al Viro 
>wrote:
>> On Mon, Nov 03, 2014 at 11:48:23AM +, David Drysdale wrote:
>>> Add a new O_BENEATH flag for openat(2) which restricts the
>>> provided path, rejecting (with -EACCES) paths that are not beneath
>>> the provided dfd.  In particular, reject:
>>>  - paths that contain .. components
>>>  - paths that begin with /
>>>  - symlinks that have paths as above.
>>
>> Yecch...  The degree of usefulness aside (and I'm not convinced that
>it
>> is non-zero),
>
>This is extremely useful in conjunction with seccomp.
>
>> WTF pass one bit out of nameidata->flags in a separate argument?
>> Through the mutual recursion, no less...  And then you are not even
>attempting
>> to detect symlinks that are not followed by interpretation of _any_
>pathname.
>
>How many symlinks like that are there?  Is there anything except
>nd_jump_link users?  All of those are in /proc.  Arguably O_BENEATH
>should prevent traversal of all of those links.

Not commenting on the sanity of this one way or another, and I haven't read the 
patch.  There is an absolutely trivial implementation of this.

After the path is resolved, walk backwards along d_parent and the mount tree, 
and see if you come to the file or directory dfd refers to.

That can handle magic proc symlinks, and does not need to disallow .. or / 
explicitly so it should be much simpler code.

My gut says that if Al says blech when looking at your code it is too complex 
to give you a security guarantee.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] fs: add O_BENEATH flag to openat(2)

2014-11-03 Thread Eric W.Biederman


On November 3, 2014 7:42:58 AM PST, Andy Lutomirski l...@amacapital.net wrote:
On Mon, Nov 3, 2014 at 7:20 AM, Al Viro v...@zeniv.linux.org.uk
wrote:
 On Mon, Nov 03, 2014 at 11:48:23AM +, David Drysdale wrote:
 Add a new O_BENEATH flag for openat(2) which restricts the
 provided path, rejecting (with -EACCES) paths that are not beneath
 the provided dfd.  In particular, reject:
  - paths that contain .. components
  - paths that begin with /
  - symlinks that have paths as above.

 Yecch...  The degree of usefulness aside (and I'm not convinced that
it
 is non-zero),

This is extremely useful in conjunction with seccomp.

 WTF pass one bit out of nameidata-flags in a separate argument?
 Through the mutual recursion, no less...  And then you are not even
attempting
 to detect symlinks that are not followed by interpretation of _any_
pathname.

How many symlinks like that are there?  Is there anything except
nd_jump_link users?  All of those are in /proc.  Arguably O_BENEATH
should prevent traversal of all of those links.

Not commenting on the sanity of this one way or another, and I haven't read the 
patch.  There is an absolutely trivial implementation of this.

After the path is resolved, walk backwards along d_parent and the mount tree, 
and see if you come to the file or directory dfd refers to.

That can handle magic proc symlinks, and does not need to disallow .. or / 
explicitly so it should be much simpler code.

My gut says that if Al says blech when looking at your code it is too complex 
to give you a security guarantee.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] dealing with proc_ns_follow_link() and "namespace" dentries

2014-11-01 Thread Eric W.Biederman


On November 1, 2014 8:06:20 AM PDT, Al Viro  wrote:
>On Sat, Nov 01, 2014 at 08:38:04AM +, Al Viro wrote:
>> OK, interim branch (_completely_ untested, and there's quite a bit of
>> work remaining) is in vfs.git#nsfs.
>
>... except that what got pushed was completely buggered - the last
>changes
>not committed *and* include/linux/ns_common.h not git-add'ed at the
>very
>beginning.  Oh, well - it wasn't hard to reconstruct its history...
>
>Anyway, that much got fixed and pushed; sorry about the noise -
>shouldn't
>have posted before grabbing some sleep...

No worries.  I am effectively without internet access right from now until 
about the 3rd week in November as I move so I really can't look anyway.

>From your description I am concerned about using the letter combination nsfs.  
> I once used nsfd, and that was so close to nfsd that Linus got confused, and 
>hilarity ensued.   nsfs isn't quite as bad but the abbreviation still seems 
>close enough to nfs that confusion could result.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] dealing with proc_ns_follow_link() and namespace dentries

2014-11-01 Thread Eric W.Biederman


On November 1, 2014 8:06:20 AM PDT, Al Viro v...@zeniv.linux.org.uk wrote:
On Sat, Nov 01, 2014 at 08:38:04AM +, Al Viro wrote:
 OK, interim branch (_completely_ untested, and there's quite a bit of
 work remaining) is in vfs.git#nsfs.

... except that what got pushed was completely buggered - the last
changes
not committed *and* include/linux/ns_common.h not git-add'ed at the
very
beginning.  Oh, well - it wasn't hard to reconstruct its history...

Anyway, that much got fixed and pushed; sorry about the noise -
shouldn't
have posted before grabbing some sleep...

No worries.  I am effectively without internet access right from now until 
about the 3rd week in November as I move so I really can't look anyway.

From your description I am concerned about using the letter combination nsfs.  
 I once used nsfd, and that was so close to nfsd that Linus got confused, and 
hilarity ensued.   nsfs isn't quite as bad but the abbreviation still seems 
close enough to nfs that confusion could result.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv1 7/8] cgroup: cgroup namespace setns support

2014-10-19 Thread Eric W.Biederman


On October 19, 2014 1:26:29 PM CDT, Andy Lutomirski  wrote:
>On Sat, Oct 18, 2014 at 10:23 PM, Eric W. Biederman
> wrote:
>> "Serge E. Hallyn"  writes:
>>
>>> Quoting Aditya Kali (adityak...@google.com):
 On Thu, Oct 16, 2014 at 2:12 PM, Serge E. Hallyn 
>wrote:
 > Quoting Aditya Kali (adityak...@google.com):
 >> setns on a cgroup namespace is allowed only if
 >> * task has CAP_SYS_ADMIN in its current user-namespace and
 >>   over the user-namespace associated with target cgroupns.
 >> * task's current cgroup is descendent of the target
>cgroupns-root
 >>   cgroup.
 >
 > What is the point of this?
 >
 > If I'm a user logged into
 > /lxc/c1/user.slice/user-1000.slice/session-c12.scope and I start
 > a container which is in
 > /lxc/c1/user.slice/user-1000.slice/session-c12.scope/x1
 > then I will want to be able to enter the container's cgroup.
 > The container's cgroup root is under my own (satisfying the
 > below condition0 but my cgroup is not a descendent of the
 > container's cgroup.
 >
 This condition is there because we don't want to do implicit cgroup
 changes when a process attaches to another cgroupns. cgroupns tries
>to
 preserve the invariant that at any point, your current cgroup is
 always under the cgroupns-root of your cgroup namespace. But in
>your
 example, if we allow a process in "session-c12.scope" container to
 attach to cgroupns root'ed at "session-c12.scope/x1" container
 (without implicitly moving its cgroup), then this invariant won't
 hold.
>>>
>>> Oh, I see.  Guess that should be workable.  Thanks.
>>
>> Which has me looking at what the rules are for moving through
>> the cgroup hierarchy.
>>
>> As long as we have write access to cgroup.procs and are allowed
>> to open the file for write, we can move any of our own tasks
>> into the cgroup.  So the cgroup namespace rules don't seem
>> to be a problem.
>>
>> Andy can you please take a look at the permission checks in
>> __cgroup_procs_write.
>
>The actual requirements for calling that function haven't changed,
>right?  IOW, what does this have to do with cgroupns?

Excluding user namespaces the requirements have not changed.  

The immediate correlation is that to enter a cgroupns you must first put your 
process in one of it's cgroups.

So I was examining what it would take to enter the cgroup of cgroupns.

> Is the idea
>that you want a privileged user wrt a cgroupns's userns to be able to
>use this?  If so:
>
>Yes, that current_cred() thing is bogus.  (Actually, this is probably
>exploitable right now if any cgroup.procs inode anywhere on the system
>lets non-root write.)  (Can we have some kernel debugging option that
>makes any use of current_cred() in write(2) warn?)
>
>We really need a weaker version of may_ptrace for this kind of stuff.
>Maybe the existing may_ptrace stuff is okay, actually.  But this is
>completely missing group checks, cap checks, capabilities wrt the
>userns, etc.
>
>Also, I think that, if this version of the patchset allows non-init
>userns to unshare cgroupns, then the issue of what permission is
>needed to lock the cgroup hierarchy like that needs to be addressed,
>because unshare(CLONE_NEWUSER|CLONE_NEWCGROUP) will effectively pin
>the calling task with no permission required.  Bolting on a fix later
>will be a mess.

I imagine the pinning would be like the userns.

Ah but there is a potentially serious issue with the pinning.
With pinning we can make it impossible for root to move us to a different 
cgroup. 

I am not certain how serious that is but it bears thinking about.
If we don't implement pinning we should be able to implent everything with just 
filesystem mount options, and no new namespace required.

Sigh.

I am too tired tonight to see the end game in this.

Eric
>> As I read the code I see 3 security gaffaws in the permssion check.
>> - Using current->cred instead of file->f_cred.
>> - Not checking tcred->euid.
>> - Checking GLOBAL_ROOT_UID instead of having a capable call.
>>
>> The file permission on cgroup.procs seem just sufficient to keep
>> to keep those bugs from being easily exploitable.
>>
>> Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv1 7/8] cgroup: cgroup namespace setns support

2014-10-19 Thread Eric W.Biederman


On October 19, 2014 1:26:29 PM CDT, Andy Lutomirski l...@amacapital.net wrote:
On Sat, Oct 18, 2014 at 10:23 PM, Eric W. Biederman
ebied...@xmission.com wrote:
 Serge E. Hallyn se...@hallyn.com writes:

 Quoting Aditya Kali (adityak...@google.com):
 On Thu, Oct 16, 2014 at 2:12 PM, Serge E. Hallyn se...@hallyn.com
wrote:
  Quoting Aditya Kali (adityak...@google.com):
  setns on a cgroup namespace is allowed only if
  * task has CAP_SYS_ADMIN in its current user-namespace and
over the user-namespace associated with target cgroupns.
  * task's current cgroup is descendent of the target
cgroupns-root
cgroup.
 
  What is the point of this?
 
  If I'm a user logged into
  /lxc/c1/user.slice/user-1000.slice/session-c12.scope and I start
  a container which is in
  /lxc/c1/user.slice/user-1000.slice/session-c12.scope/x1
  then I will want to be able to enter the container's cgroup.
  The container's cgroup root is under my own (satisfying the
  below condition0 but my cgroup is not a descendent of the
  container's cgroup.
 
 This condition is there because we don't want to do implicit cgroup
 changes when a process attaches to another cgroupns. cgroupns tries
to
 preserve the invariant that at any point, your current cgroup is
 always under the cgroupns-root of your cgroup namespace. But in
your
 example, if we allow a process in session-c12.scope container to
 attach to cgroupns root'ed at session-c12.scope/x1 container
 (without implicitly moving its cgroup), then this invariant won't
 hold.

 Oh, I see.  Guess that should be workable.  Thanks.

 Which has me looking at what the rules are for moving through
 the cgroup hierarchy.

 As long as we have write access to cgroup.procs and are allowed
 to open the file for write, we can move any of our own tasks
 into the cgroup.  So the cgroup namespace rules don't seem
 to be a problem.

 Andy can you please take a look at the permission checks in
 __cgroup_procs_write.

The actual requirements for calling that function haven't changed,
right?  IOW, what does this have to do with cgroupns?

Excluding user namespaces the requirements have not changed.  

The immediate correlation is that to enter a cgroupns you must first put your 
process in one of it's cgroups.

So I was examining what it would take to enter the cgroup of cgroupns.

 Is the idea
that you want a privileged user wrt a cgroupns's userns to be able to
use this?  If so:

Yes, that current_cred() thing is bogus.  (Actually, this is probably
exploitable right now if any cgroup.procs inode anywhere on the system
lets non-root write.)  (Can we have some kernel debugging option that
makes any use of current_cred() in write(2) warn?)

We really need a weaker version of may_ptrace for this kind of stuff.
Maybe the existing may_ptrace stuff is okay, actually.  But this is
completely missing group checks, cap checks, capabilities wrt the
userns, etc.

Also, I think that, if this version of the patchset allows non-init
userns to unshare cgroupns, then the issue of what permission is
needed to lock the cgroup hierarchy like that needs to be addressed,
because unshare(CLONE_NEWUSER|CLONE_NEWCGROUP) will effectively pin
the calling task with no permission required.  Bolting on a fix later
will be a mess.

I imagine the pinning would be like the userns.

Ah but there is a potentially serious issue with the pinning.
With pinning we can make it impossible for root to move us to a different 
cgroup. 

I am not certain how serious that is but it bears thinking about.
If we don't implement pinning we should be able to implent everything with just 
filesystem mount options, and no new namespace required.

Sigh.

I am too tired tonight to see the end game in this.

Eric
 As I read the code I see 3 security gaffaws in the permssion check.
 - Using current-cred instead of file-f_cred.
 - Not checking tcred-euid.
 - Checking GLOBAL_ROOT_UID instead of having a capable call.

 The file permission on cgroup.procs seem just sufficient to keep
 to keep those bugs from being easily exploitable.

 Eric

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/