subject:"Re\: \[kernel\-hardening\] Re\: \[PATCH resend 2\/2\] userns\: control capabilities of some user namespaces"

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread महेश बंडेवार

On Fri, Nov 10, 2017 at 1:46 PM, Serge E. Hallyn  wrote:
> Quoting Eric W. Biederman (ebied...@xmission.com):
>> single sandbox.  I am not at all certain that the capabilities is the
>> proper place to limit code reachability.
>
> Right, I keep having this gut feeling that there is another way we
> should be doing that.  Maybe based on ksplice or perf, or maybe more
> based on subsystems.  And I hope someone pursues that.  But I can't put
> my finger on it, and meanwhile the capability checks obviously *are* in
> fact gates...
>
Well, I don't mind if there is a better solution available. The
proposed solution is not adding too much or complex code and using a
bit and a sysctl and will be sitting dormant. When we have complete
solution, this addition should not be a burden to maintain because of
it's non-invasive footprint.

I will push the next version of the patch-set that implements Serge's finding.

Thanks,
--mahesh..

[PS: I'll be soon traveling again and moving to an area where
connectivity will be scarce / unreliable. So please expect lot more
delays in my responses.]

> -serge

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread Serge E. Hallyn

Quoting Eric W. Biederman (ebied...@xmission.com):
> single sandbox.  I am not at all certain that the capabilities is the
> proper place to limit code reachability.

Right, I keep having this gut feeling that there is another way we
should be doing that.  Maybe based on ksplice or perf, or maybe more
based on subsystems.  And I hope someone pursues that.  But I can't put
my finger on it, and meanwhile the capability checks obviously *are* in
fact gates...

-serge

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread महेश बंडेवार

On Fri, Nov 10, 2017 at 6:58 AM, Eric W. Biederman
 wrote:
> "Mahesh Bandewar (महेश बंडेवार)"  writes:
>
>> [resend response as earlier one failed because of formatting issues]
>>
>> On Thu, Nov 9, 2017 at 12:21 PM, Serge E. Hallyn  wrote:
>>>
>>> On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) 
>>> wrote:
>>> > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner
>>> >  wrote:
>>> > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश 
>>> > > बंडेवार) wrote:
>>> > >> Sorry folks I was traveling and seems like lot happened on this 
>>> > >> thread. :p
>>> > >>
>>> > >> I will try to response few of these comments selectively -
>>> > >>
>>> > >> > The thing that makes me hesitate with this set is that it is a
>>> > >> > permanent new feature to address what (I hope) is a temporary
>>> > >> > problem.
>>> > >> I agree this is permanent new feature but it's not solving a temporary
>>> > >> problem. It's impossible to assess what and when new vulnerability
>>> > >> that could show up. I think Daniel summed it up appropriately in his
>>> > >> response
>>> > >>
>>> > >> > Seems like there are two naive ways to do it, the first being to just
>>> > >> > look at all code under ns_capable() plus code called from there.  It
>>> > >> > seems like looking at the result of that could be fruitful.
>>> > >> This is really hard. The main issue that there were features designed
>>> > >> and developed before user-ns days with an assumption that unprivileged
>>> > >> users will never get certain capabilities which only root user gets.
>>> > >> Now that is not true anymore with user-ns creation with mapping root
>>> > >> for any process. Also at the same time blocking user-ns creation for
>>> > >> eveyone is a big-hammer which is not needed too. So it's not that easy
>>> > >> to just perform a code-walk-though and correct those decisions now.
>>> > >>
>>> > >> > It seems to me that the existing control in
>>> > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct 
>>> > >> > tape
>>> > >> > in that case.
>>> > >> This solution is essentially blocking unprivileged users from using
>>> > >> the user-namespaces entirely. This is not really a solution that can
>>> > >> work. The solution that this patch-set adds allows unprivileged users
>>> > >> to create user-namespaces. Actually the proposed solution is more
>>> > >> fine-grained approach than the unprivileged_userns_clone solution
>>> > >> since you can selectively block capabilities rather than completely
>>> > >> blocking the functionality.
>>> > >
>>> > > I've been talking to Stéphane today about this and we should also keep 
>>> > > in mind
>>> > > that we have:
>>> > >
>>> > > chb@conventiont|~
>>> > >> ls -al /proc/sys/user/
>>> > > total 0
>>> > > dr-xr-xr-x 1 root root 0 Nov  6 23:32 .
>>> > > dr-xr-xr-x 1 root root 0 Nov  2 22:13 ..
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_cgroup_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_instances
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_watches
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_ipc_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_mnt_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_net_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_pid_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_user_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_uts_namespaces
>>> > >
>>> > > These files allow you to limit the number of namespaces that can be 
>>> > > created
>>> > > *per namespace* type. So let's say your system runs a bunch of user 
>>> > > namespaces
>>> > > you can do:
>>> > >
>>> > > chb@conventiont|~
>>> > >> echo 0 > /proc/sys/user/max_user_namespaces
>>> > >
>>> > > So that the next time you try to create a user namespaces you'd see:
>>> > >
>>> > > chb@conventiont|~
>>> > >> unshare -U
>>> > > unshare: unshare failed: No space left on device
>>> > >
>>> > > So there's not even a need to upstream a new sysctl since we have ways 
>>> > > of
>>> > > blocking this.
>>> > >
>>> > I'm not sure how it's solving the problem that my patch-set is addressing?
>>> > I agree though that the need for unprivileged_userns_clone sysctl goes
>>> > away as this is equivalent to setting that sysctl to 0 as you have
>>> > described above.
>>>
>>> oh right that was the reasoning iirc for not needing the other sysctl.
>>>
>>> > However as I mentioned earlier, blocking processes from creating
>>> > user-namespaces is not the solution. Processes should be able to
>>> > create namespaces as they are designed but at the same time we need to
>>> > have controls to 'contain' them if a need arise. Setting max_no to 0
>>> > is not the solution that I'm looking for since it doesn't solve the
>>> > problem.
>>>
>>> well yesterday we were told that was explicitly not the goal, but that was
>>> not by you ... i just mention it to explain why we seem to be walking in
>>> circles a bit.

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread Eric W. Biederman

"Mahesh Bandewar (महेश बंडेवार)"  writes:

> [resend response as earlier one failed because of formatting issues]
>
> On Thu, Nov 9, 2017 at 12:21 PM, Serge E. Hallyn  wrote:
>>
>> On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) 
>> wrote:
>> > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner
>> >  wrote:
>> > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) 
>> > > wrote:
>> > >> Sorry folks I was traveling and seems like lot happened on this thread. 
>> > >> :p
>> > >>
>> > >> I will try to response few of these comments selectively -
>> > >>
>> > >> > The thing that makes me hesitate with this set is that it is a
>> > >> > permanent new feature to address what (I hope) is a temporary
>> > >> > problem.
>> > >> I agree this is permanent new feature but it's not solving a temporary
>> > >> problem. It's impossible to assess what and when new vulnerability
>> > >> that could show up. I think Daniel summed it up appropriately in his
>> > >> response
>> > >>
>> > >> > Seems like there are two naive ways to do it, the first being to just
>> > >> > look at all code under ns_capable() plus code called from there.  It
>> > >> > seems like looking at the result of that could be fruitful.
>> > >> This is really hard. The main issue that there were features designed
>> > >> and developed before user-ns days with an assumption that unprivileged
>> > >> users will never get certain capabilities which only root user gets.
>> > >> Now that is not true anymore with user-ns creation with mapping root
>> > >> for any process. Also at the same time blocking user-ns creation for
>> > >> eveyone is a big-hammer which is not needed too. So it's not that easy
>> > >> to just perform a code-walk-though and correct those decisions now.
>> > >>
>> > >> > It seems to me that the existing control in
>> > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct 
>> > >> > tape
>> > >> > in that case.
>> > >> This solution is essentially blocking unprivileged users from using
>> > >> the user-namespaces entirely. This is not really a solution that can
>> > >> work. The solution that this patch-set adds allows unprivileged users
>> > >> to create user-namespaces. Actually the proposed solution is more
>> > >> fine-grained approach than the unprivileged_userns_clone solution
>> > >> since you can selectively block capabilities rather than completely
>> > >> blocking the functionality.
>> > >
>> > > I've been talking to Stéphane today about this and we should also keep 
>> > > in mind
>> > > that we have:
>> > >
>> > > chb@conventiont|~
>> > >> ls -al /proc/sys/user/
>> > > total 0
>> > > dr-xr-xr-x 1 root root 0 Nov  6 23:32 .
>> > > dr-xr-xr-x 1 root root 0 Nov  2 22:13 ..
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_cgroup_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_instances
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_watches
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_ipc_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_mnt_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_net_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_pid_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_user_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_uts_namespaces
>> > >
>> > > These files allow you to limit the number of namespaces that can be 
>> > > created
>> > > *per namespace* type. So let's say your system runs a bunch of user 
>> > > namespaces
>> > > you can do:
>> > >
>> > > chb@conventiont|~
>> > >> echo 0 > /proc/sys/user/max_user_namespaces
>> > >
>> > > So that the next time you try to create a user namespaces you'd see:
>> > >
>> > > chb@conventiont|~
>> > >> unshare -U
>> > > unshare: unshare failed: No space left on device
>> > >
>> > > So there's not even a need to upstream a new sysctl since we have ways of
>> > > blocking this.
>> > >
>> > I'm not sure how it's solving the problem that my patch-set is addressing?
>> > I agree though that the need for unprivileged_userns_clone sysctl goes
>> > away as this is equivalent to setting that sysctl to 0 as you have
>> > described above.
>>
>> oh right that was the reasoning iirc for not needing the other sysctl.
>>
>> > However as I mentioned earlier, blocking processes from creating
>> > user-namespaces is not the solution. Processes should be able to
>> > create namespaces as they are designed but at the same time we need to
>> > have controls to 'contain' them if a need arise. Setting max_no to 0
>> > is not the solution that I'm looking for since it doesn't solve the
>> > problem.
>>
>> well yesterday we were told that was explicitly not the goal, but that was
>> not by you ... i just mention it to explain why we seem to be walking in
>> circles a bit.
>>
>> anyway the bounding set doesn't actually make sense so forget that.   the
>> question then is just whether it makes sense to allow things to continue
>> at all in t

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread chris hyser


On 11/09/2017 01:05 PM, Serge E. Hallyn wrote:

Would the existing capability bounding set not suffice for that?

The 'permanent' bounding set turns out to not be a good fit for
the problem being discussed in this thread, but please feel free
to start a new thread if you want to discuss your use case.


Sure. I will formulate something for a new thread. What seems to be 
asked for here is a way to globally patch the capability sets of a 
entire process subtree.


-chrish

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread Serge E. Hallyn

Quoting chris hyser (chris.hy...@oracle.com):
> On 11/06/2017 10:23 PM, Serge E. Hallyn wrote:
> >I think I definately prefer what I mentioned in the email to Boris.
> >Basically a "permanent capability bounding set".  The normal bounding
> >set gets reset to a full set on every new user_ns creation.  In this
> >proposal, it would instead be set to the calling task's permanent
> >capability set, which starts (at boot) full, and which privileged
> >tasks can pull capabilities out of.
> 
> Actually, this may solve a similar problem I've been looking at. The
> idea was basically at strategic points in the kernel (possibly LSM
> hook sites, still evaluating, and probably syscall entry) validate
> that a task has not "magically" acquired capabilities that it or
> parent specifically said it cannot have and then take some action
> like say killing it immediately. Using your terms, basically make
> the "permanent capability set" a write-once privilege escalation
> defense. To handle the 0-day threat, perhaps make it writable but
> only with more "restrictive" values.

Would the existing capability bounding set not suffice for that?

The 'permanent' bounding set turns out to not be a good fit for
the problem being discussed in this thread, but please feel free
to start a new thread if you want to discuss your use case.

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread chris hyser


On 11/06/2017 10:23 PM, Serge E. Hallyn wrote:

I think I definately prefer what I mentioned in the email to Boris.
Basically a "permanent capability bounding set".  The normal bounding
set gets reset to a full set on every new user_ns creation.  In this
proposal, it would instead be set to the calling task's permanent
capability set, which starts (at boot) full, and which privileged
tasks can pull capabilities out of.


Actually, this may solve a similar problem I've been looking at. The 
idea was basically at strategic points in the kernel (possibly LSM hook 
sites, still evaluating, and probably syscall entry) validate that a 
task has not "magically" acquired capabilities that it or parent 
specifically said it cannot have and then take some action like say 
killing it immediately. Using your terms, basically make the "permanent 
capability set" a write-once privilege escalation defense. To handle the 
0-day threat, perhaps make it writable but only with more "restrictive" 
values.


-chrish

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread Serge E. Hallyn

Quoting Mahesh Bandewar (महेश बंडेवार) (mahe...@google.com):
> Of course. Let's take an example of the CVE that I have mentioned in
> my cover-letter -
> CVE-2017-7308(https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-7308).
> It's well documented and even has a
> exploit(https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308)
> c-program that can demonstrate how it can be used against non-patched
> kernel. There is very nice blog
> post(https://googleprojectzero.blogspot.kr/2017/05/exploiting-linux-kernel-via-packet.html)
> about this vulnerability by Andrey Konovalov.

Ok, thanks.  It's a good example because the fix for this CVE actually
came by itself 
(http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/debian.master/changelog).
Normally multiple CVEs come at the same time, which would make a
workaround for one now helpful.  This is a good counter-example.

I'm going to maintain that I really don't like this.  But it looks
useful, so ack on the concept, I'll just have to look again at the
code now.  Thanks for indulging me.

-serge

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-08 Thread महेश बंडेवार

[resend response as earlier one failed because of formatting issues]

On Thu, Nov 9, 2017 at 12:21 PM, Serge E. Hallyn  wrote:
>
> On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) 
> wrote:
> > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner
> >  wrote:
> > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) 
> > > wrote:
> > >> Sorry folks I was traveling and seems like lot happened on this thread. 
> > >> :p
> > >>
> > >> I will try to response few of these comments selectively -
> > >>
> > >> > The thing that makes me hesitate with this set is that it is a
> > >> > permanent new feature to address what (I hope) is a temporary
> > >> > problem.
> > >> I agree this is permanent new feature but it's not solving a temporary
> > >> problem. It's impossible to assess what and when new vulnerability
> > >> that could show up. I think Daniel summed it up appropriately in his
> > >> response
> > >>
> > >> > Seems like there are two naive ways to do it, the first being to just
> > >> > look at all code under ns_capable() plus code called from there.  It
> > >> > seems like looking at the result of that could be fruitful.
> > >> This is really hard. The main issue that there were features designed
> > >> and developed before user-ns days with an assumption that unprivileged
> > >> users will never get certain capabilities which only root user gets.
> > >> Now that is not true anymore with user-ns creation with mapping root
> > >> for any process. Also at the same time blocking user-ns creation for
> > >> eveyone is a big-hammer which is not needed too. So it's not that easy
> > >> to just perform a code-walk-though and correct those decisions now.
> > >>
> > >> > It seems to me that the existing control in
> > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct 
> > >> > tape
> > >> > in that case.
> > >> This solution is essentially blocking unprivileged users from using
> > >> the user-namespaces entirely. This is not really a solution that can
> > >> work. The solution that this patch-set adds allows unprivileged users
> > >> to create user-namespaces. Actually the proposed solution is more
> > >> fine-grained approach than the unprivileged_userns_clone solution
> > >> since you can selectively block capabilities rather than completely
> > >> blocking the functionality.
> > >
> > > I've been talking to Stéphane today about this and we should also keep in 
> > > mind
> > > that we have:
> > >
> > > chb@conventiont|~
> > >> ls -al /proc/sys/user/
> > > total 0
> > > dr-xr-xr-x 1 root root 0 Nov  6 23:32 .
> > > dr-xr-xr-x 1 root root 0 Nov  2 22:13 ..
> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_cgroup_namespaces
> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_instances
> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_watches
> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_ipc_namespaces
> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_mnt_namespaces
> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_net_namespaces
> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_pid_namespaces
> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_user_namespaces
> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_uts_namespaces
> > >
> > > These files allow you to limit the number of namespaces that can be 
> > > created
> > > *per namespace* type. So let's say your system runs a bunch of user 
> > > namespaces
> > > you can do:
> > >
> > > chb@conventiont|~
> > >> echo 0 > /proc/sys/user/max_user_namespaces
> > >
> > > So that the next time you try to create a user namespaces you'd see:
> > >
> > > chb@conventiont|~
> > >> unshare -U
> > > unshare: unshare failed: No space left on device
> > >
> > > So there's not even a need to upstream a new sysctl since we have ways of
> > > blocking this.
> > >
> > I'm not sure how it's solving the problem that my patch-set is addressing?
> > I agree though that the need for unprivileged_userns_clone sysctl goes
> > away as this is equivalent to setting that sysctl to 0 as you have
> > described above.
>
> oh right that was the reasoning iirc for not needing the other sysctl.
>
> > However as I mentioned earlier, blocking processes from creating
> > user-namespaces is not the solution. Processes should be able to
> > create namespaces as they are designed but at the same time we need to
> > have controls to 'contain' them if a need arise. Setting max_no to 0
> > is not the solution that I'm looking for since it doesn't solve the
> > problem.
>
> well yesterday we were told that was explicitly not the goal, but that was
> not by you ... i just mention it to explain why we seem to be walking in
> circles a bit.
>
> anyway the bounding set doesn't actually make sense so forget that.   the
> question then is just whether it makes sense to allow things to continue
> at all in this situation.  would you mind indulging me by giving one or two
> concrete examples in the previous known cves of what capabilities you would
> have

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-08 Thread Serge E. Hallyn

On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner
>  wrote:
> > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) 
> > wrote:
> >> Sorry folks I was traveling and seems like lot happened on this thread. :p
> >>
> >> I will try to response few of these comments selectively -
> >>
> >> > The thing that makes me hesitate with this set is that it is a
> >> > permanent new feature to address what (I hope) is a temporary
> >> > problem.
> >> I agree this is permanent new feature but it's not solving a temporary
> >> problem. It's impossible to assess what and when new vulnerability
> >> that could show up. I think Daniel summed it up appropriately in his
> >> response
> >>
> >> > Seems like there are two naive ways to do it, the first being to just
> >> > look at all code under ns_capable() plus code called from there.  It
> >> > seems like looking at the result of that could be fruitful.
> >> This is really hard. The main issue that there were features designed
> >> and developed before user-ns days with an assumption that unprivileged
> >> users will never get certain capabilities which only root user gets.
> >> Now that is not true anymore with user-ns creation with mapping root
> >> for any process. Also at the same time blocking user-ns creation for
> >> eveyone is a big-hammer which is not needed too. So it's not that easy
> >> to just perform a code-walk-though and correct those decisions now.
> >>
> >> > It seems to me that the existing control in
> >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
> >> > in that case.
> >> This solution is essentially blocking unprivileged users from using
> >> the user-namespaces entirely. This is not really a solution that can
> >> work. The solution that this patch-set adds allows unprivileged users
> >> to create user-namespaces. Actually the proposed solution is more
> >> fine-grained approach than the unprivileged_userns_clone solution
> >> since you can selectively block capabilities rather than completely
> >> blocking the functionality.
> >
> > I've been talking to Stéphane today about this and we should also keep in 
> > mind
> > that we have:
> >
> > chb@conventiont|~
> >> ls -al /proc/sys/user/
> > total 0
> > dr-xr-xr-x 1 root root 0 Nov  6 23:32 .
> > dr-xr-xr-x 1 root root 0 Nov  2 22:13 ..
> > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_cgroup_namespaces
> > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_instances
> > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_watches
> > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_ipc_namespaces
> > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_mnt_namespaces
> > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_net_namespaces
> > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_pid_namespaces
> > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_user_namespaces
> > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_uts_namespaces
> >
> > These files allow you to limit the number of namespaces that can be created
> > *per namespace* type. So let's say your system runs a bunch of user 
> > namespaces
> > you can do:
> >
> > chb@conventiont|~
> >> echo 0 > /proc/sys/user/max_user_namespaces
> >
> > So that the next time you try to create a user namespaces you'd see:
> >
> > chb@conventiont|~
> >> unshare -U
> > unshare: unshare failed: No space left on device
> >
> > So there's not even a need to upstream a new sysctl since we have ways of
> > blocking this.
> >
> I'm not sure how it's solving the problem that my patch-set is addressing?
> I agree though that the need for unprivileged_userns_clone sysctl goes
> away as this is equivalent to setting that sysctl to 0 as you have
> described above.

oh right that was the reasoning iirc for not needing the other sysctl.

> However as I mentioned earlier, blocking processes from creating
> user-namespaces is not the solution. Processes should be able to
> create namespaces as they are designed but at the same time we need to
> have controls to 'contain' them if a need arise. Setting max_no to 0
> is not the solution that I'm looking for since it doesn't solve the
> problem.

well yesterday we were told that was explicitly not the goal, but that was 
not by you ... i just mention it to explain why we seem to be walking in
circles a bit.

anyway the bounding set doesn't actually make sense so forget that.   the
question then is just whether it makes sense to allow things to continue
at all in this situation.  would you mind indulging me by giving one or two
concrete examples in the previous known cves of what capabilities you would
have dropped tto allow the rest to continue to be safely used?

thanks,
serge

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-08 Thread महेश बंडेवार

On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner
 wrote:
> On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) 
> wrote:
>> Sorry folks I was traveling and seems like lot happened on this thread. :p
>>
>> I will try to response few of these comments selectively -
>>
>> > The thing that makes me hesitate with this set is that it is a
>> > permanent new feature to address what (I hope) is a temporary
>> > problem.
>> I agree this is permanent new feature but it's not solving a temporary
>> problem. It's impossible to assess what and when new vulnerability
>> that could show up. I think Daniel summed it up appropriately in his
>> response
>>
>> > Seems like there are two naive ways to do it, the first being to just
>> > look at all code under ns_capable() plus code called from there.  It
>> > seems like looking at the result of that could be fruitful.
>> This is really hard. The main issue that there were features designed
>> and developed before user-ns days with an assumption that unprivileged
>> users will never get certain capabilities which only root user gets.
>> Now that is not true anymore with user-ns creation with mapping root
>> for any process. Also at the same time blocking user-ns creation for
>> eveyone is a big-hammer which is not needed too. So it's not that easy
>> to just perform a code-walk-though and correct those decisions now.
>>
>> > It seems to me that the existing control in
>> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
>> > in that case.
>> This solution is essentially blocking unprivileged users from using
>> the user-namespaces entirely. This is not really a solution that can
>> work. The solution that this patch-set adds allows unprivileged users
>> to create user-namespaces. Actually the proposed solution is more
>> fine-grained approach than the unprivileged_userns_clone solution
>> since you can selectively block capabilities rather than completely
>> blocking the functionality.
>
> I've been talking to Stéphane today about this and we should also keep in mind
> that we have:
>
> chb@conventiont|~
>> ls -al /proc/sys/user/
> total 0
> dr-xr-xr-x 1 root root 0 Nov  6 23:32 .
> dr-xr-xr-x 1 root root 0 Nov  2 22:13 ..
> -rw-r--r-- 1 root root 0 Nov  8 19:48 max_cgroup_namespaces
> -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_instances
> -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_watches
> -rw-r--r-- 1 root root 0 Nov  8 19:48 max_ipc_namespaces
> -rw-r--r-- 1 root root 0 Nov  8 19:48 max_mnt_namespaces
> -rw-r--r-- 1 root root 0 Nov  8 19:48 max_net_namespaces
> -rw-r--r-- 1 root root 0 Nov  8 19:48 max_pid_namespaces
> -rw-r--r-- 1 root root 0 Nov  8 19:48 max_user_namespaces
> -rw-r--r-- 1 root root 0 Nov  8 19:48 max_uts_namespaces
>
> These files allow you to limit the number of namespaces that can be created
> *per namespace* type. So let's say your system runs a bunch of user namespaces
> you can do:
>
> chb@conventiont|~
>> echo 0 > /proc/sys/user/max_user_namespaces
>
> So that the next time you try to create a user namespaces you'd see:
>
> chb@conventiont|~
>> unshare -U
> unshare: unshare failed: No space left on device
>
> So there's not even a need to upstream a new sysctl since we have ways of
> blocking this.
>
I'm not sure how it's solving the problem that my patch-set is addressing?
I agree though that the need for unprivileged_userns_clone sysctl goes
away as this is equivalent to setting that sysctl to 0 as you have
described above.
However as I mentioned earlier, blocking processes from creating
user-namespaces is not the solution. Processes should be able to
create namespaces as they are designed but at the same time we need to
have controls to 'contain' them if a need arise. Setting max_no to 0
is not the solution that I'm looking for since it doesn't solve the
problem.

> Also I'd like to point out that a lot of capability checks and actual security
> vulnerabilities are associated with CAP_SYS_ADMIN. So what you likely want to 
> do
> is block CAP_SYS_ADMIN in user namespaces but at this point they become
> basically useless for a lot of interesting use cases. In addition, this patch
> would add another layer of complexity that is - imho - not really warranted
> given what we already have.
I disagree. I'm not sure how this patch is adding complexity? Simply
the functionality is maintained exactly as it is with an extra knob
which allows you to take control back if a situation arise. Once the
kernel is patched for whatever was discovered, life returns back to
normal by readjusting the knob. It's as simple as that!

> The relationship between capabilities and user
> namespaces should stay as simply as possible so that it stays maintaineable.
> User namespaces already introduce a proper layer of complexity.
> Just my two cents. I might be totally off here of course.
The side effect of the solution is that you have sort-of scaled-down /
broken functionality without blocking the feature complete

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-08 Thread Christian Brauner

On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) wrote:
> Sorry folks I was traveling and seems like lot happened on this thread. :p
> 
> I will try to response few of these comments selectively -
> 
> > The thing that makes me hesitate with this set is that it is a
> > permanent new feature to address what (I hope) is a temporary
> > problem.
> I agree this is permanent new feature but it's not solving a temporary
> problem. It's impossible to assess what and when new vulnerability
> that could show up. I think Daniel summed it up appropriately in his
> response
> 
> > Seems like there are two naive ways to do it, the first being to just
> > look at all code under ns_capable() plus code called from there.  It
> > seems like looking at the result of that could be fruitful.
> This is really hard. The main issue that there were features designed
> and developed before user-ns days with an assumption that unprivileged
> users will never get certain capabilities which only root user gets.
> Now that is not true anymore with user-ns creation with mapping root
> for any process. Also at the same time blocking user-ns creation for
> eveyone is a big-hammer which is not needed too. So it's not that easy
> to just perform a code-walk-though and correct those decisions now.
> 
> > It seems to me that the existing control in
> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
> > in that case.
> This solution is essentially blocking unprivileged users from using
> the user-namespaces entirely. This is not really a solution that can
> work. The solution that this patch-set adds allows unprivileged users
> to create user-namespaces. Actually the proposed solution is more
> fine-grained approach than the unprivileged_userns_clone solution
> since you can selectively block capabilities rather than completely
> blocking the functionality.

I've been talking to Stéphane today about this and we should also keep in mind
that we have:

chb@conventiont|~
> ls -al /proc/sys/user/
total 0
dr-xr-xr-x 1 root root 0 Nov  6 23:32 .
dr-xr-xr-x 1 root root 0 Nov  2 22:13 ..
-rw-r--r-- 1 root root 0 Nov  8 19:48 max_cgroup_namespaces
-rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_instances
-rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_watches
-rw-r--r-- 1 root root 0 Nov  8 19:48 max_ipc_namespaces
-rw-r--r-- 1 root root 0 Nov  8 19:48 max_mnt_namespaces
-rw-r--r-- 1 root root 0 Nov  8 19:48 max_net_namespaces
-rw-r--r-- 1 root root 0 Nov  8 19:48 max_pid_namespaces
-rw-r--r-- 1 root root 0 Nov  8 19:48 max_user_namespaces
-rw-r--r-- 1 root root 0 Nov  8 19:48 max_uts_namespaces

These files allow you to limit the number of namespaces that can be created
*per namespace* type. So let's say your system runs a bunch of user namespaces
you can do:

chb@conventiont|~
> echo 0 > /proc/sys/user/max_user_namespaces

So that the next time you try to create a user namespaces you'd see:

chb@conventiont|~
> unshare -U
unshare: unshare failed: No space left on device

So there's not even a need to upstream a new sysctl since we have ways of
blocking this.

Also I'd like to point out that a lot of capability checks and actual security
vulnerabilities are associated with CAP_SYS_ADMIN. So what you likely want to do
is block CAP_SYS_ADMIN in user namespaces but at this point they become
basically useless for a lot of interesting use cases. In addition, this patch
would add another layer of complexity that is - imho - not really warranted
given what we already have. The relationship between capabilities and user
namespaces should stay as simply as possible so that it stays maintaineable.
User namespaces already introduce a proper layer of complexity.
Just my two cents. I might be totally off here of course.

Christian

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-08 Thread महेश बंडेवार

Sorry folks I was traveling and seems like lot happened on this thread. :p

I will try to response few of these comments selectively -

> The thing that makes me hesitate with this set is that it is a
> permanent new feature to address what (I hope) is a temporary
> problem.
I agree this is permanent new feature but it's not solving a temporary
problem. It's impossible to assess what and when new vulnerability
that could show up. I think Daniel summed it up appropriately in his
response

> Seems like there are two naive ways to do it, the first being to just
> look at all code under ns_capable() plus code called from there.  It
> seems like looking at the result of that could be fruitful.
This is really hard. The main issue that there were features designed
and developed before user-ns days with an assumption that unprivileged
users will never get certain capabilities which only root user gets.
Now that is not true anymore with user-ns creation with mapping root
for any process. Also at the same time blocking user-ns creation for
eveyone is a big-hammer which is not needed too. So it's not that easy
to just perform a code-walk-though and correct those decisions now.

> It seems to me that the existing control in
> /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
> in that case.
This solution is essentially blocking unprivileged users from using
the user-namespaces entirely. This is not really a solution that can
work. The solution that this patch-set adds allows unprivileged users
to create user-namespaces. Actually the proposed solution is more
fine-grained approach than the unprivileged_userns_clone solution
since you can selectively block capabilities rather than completely
blocking the functionality.

> I meant each task has a perm_cap_bset next to the cap_bset.  So task
> p1 (if it has privilege) can drop CAP_SYS_ADMIN from perm_cap_bset,
> p2 (if it has privilege) can drop CAP_NET_ADMIN.  When p1 creates a
> new user_ns, that init task has its cap_bset set to all caps but
> CAP_SYS_ADMIN.
>
> I think for simplicity perm_cap_bset would *only* affect the filling
> of cap_bset at user namespace creation.  So if you wanted to drop a
> capability from your own cap_bset as well, you'd have to do that
> separately.
My original intention is to reduce the attack surface when
vulnerabilities are discovered / published, but I don't see how this
is solving that issue. Also the reason to have sysctl is to have
simplistic control across the board to contain the situation. If that
is not addressed then we might need some other solution on top of
this.

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Serge E. Hallyn

On Mon, Nov 06, 2017 at 07:01:58PM -0500, Boris Lukashev wrote:
> On Mon, Nov 6, 2017 at 6:39 PM, Serge E. Hallyn  wrote:
> > Quoting Boris Lukashev (blukas...@sempervictus.com):
> >> On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn  wrote:
> >> > Quoting Daniel Micay (danielmi...@gmail.com):
> >> >> Substantial added attack surface will never go away as a problem. There
> >> >> aren't a finite number of vulnerabilities to be found.
> >> >
> >> > There's varying levels of usefulness and quality.  There is code which I
> >> > want to be able to use in a container, and code which I can't ever see a
> >> > reason for using there.  The latter, especially if it's also in a
> >> > staging driver, would be nice to have a toggle to disable.
> >> >
> >> > You're not advocating dropping the added attack surface, only adding a
> >> > way of dealing with an 0day after the fact.  Privilege raising 0days can
> >> > exist anywhere, not just in code which only root in a user namespace can
> >> > exercise.  So from that point of view, ksplice seems a more complete
> >> > solution.  Why not just actually fix the bad code block when we know
> >> > about it?
> >> >
> >> > Finally, it has been well argued that you can gain many new caps from
> >> > having only a few others.  Given that, how could you ever be sure that,
> >> > if an 0day is found which allows root in a user ns to abuse
> >> > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
> >> > would suffice?  It seems to me that the existing control in
> >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
> >> > in that case.
> >> >
> >> > -serge
> >>
> >> This seems to be heading toward "we need full zones in Linux" with
> >> their own procfs and sysfs namespace and a stricter isolation model
> >> for resources and capabilities. So long as things can happen in a
> >> namespace which have a privileged relationship with host resources,
> >> this is going to be cat-and-mouse to one degree or another.
> >>
> >> Containers and namespaces dont have a one-to-one relationship, so i'm
> >> not sure that's the best term to use in the kernel security context
> >
> > Sorry - what's not the best term to use?
> 
> Pardon, "containers," since they're namespaces+system construct.
> 
> >
> >> since there's a bunch of userspace and implementation delta across the
> >> different systems (with their own security models and so forth).
> >> Without accounting for what a specific implementation may or may not
> >> do, and only looking at "how do we reduce privileged impact on parent
> >> context from unprivileged namespaces," this patch does seem to provide
> >> a logical way of reducing the privileges available in such a namespace
> >> and often needed to mount escapes/impact parent context.
> >
> > What different implementations do is irrelevant - as an unprivileged user
> > I can always, with no help, create a new user namespace mapping my current
> > uid to root, and exercise this code.  So the security model implemented
> > by a particular userspace namespace-using driver doesn't matter, as it
> > only restricts me if I choose to use it.
> >
> > But, I guess you're actually saying that some program might know that it
> > should never use network code so want to drop CAP_NET_*?  And you're
> > saying that a "global capability bounding set" might be useful?
> >
> 
> The "global capability bounding set" with forced inheritance can be
> used to prevent the vector you describe wherein the capability of UID
> 0 in the child NS is restricted from the parent implicitly, so yes,
> that nomenclature seems appropriate.
> 
> > Would it be better to actually implement it as a new bounding set that
> > is maintained across user namespace creations, but is per-task (inherted
> > by children of course)?  Instead of a sysctl?
> >
> > -serge
> 
> In line with the previous comment, the inheritance across subsequent
> invocations should be forced to prevent the context you described.
> Please pardon my ignorance, not sure what you mean in terms of
> "per-task" across namespace creation.

I meant each task has a perm_cap_bset next to the cap_bset.  So task
p1 (if it has privilege) can drop CAP_SYS_ADMIN from perm_cap_bset,
p2 (if it has privilege) can drop CAP_NET_ADMIN.  When p1 creates a
new user_ns, that init task has its cap_bset set to all caps but
CAP_SYS_ADMIN.

I think for simplicity perm_cap_bset would *only* affect the filling
of cap_bset at user namespace creation.  So if you wanted to drop a
capability from your own cap_bset as well, you'd have to do that
separately.

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Serge E. Hallyn

On Mon, Nov 06, 2017 at 09:16:03PM -0500, Daniel Micay wrote:
> On Mon, 2017-11-06 at 16:14 -0600, Serge E. Hallyn wrote:
> > Quoting Daniel Micay (danielmi...@gmail.com):
> > > Substantial added attack surface will never go away as a problem.
> > > There
> > > aren't a finite number of vulnerabilities to be found.
> > 
> > There's varying levels of usefulness and quality.  There is code which
> > I
> > want to be able to use in a container, and code which I can't ever see
> > a
> > reason for using there.  The latter, especially if it's also in a
> > staging driver, would be nice to have a toggle to disable.
> > 
> > You're not advocating dropping the added attack surface, only adding a
> > way of dealing with an 0day after the fact.  Privilege raising 0days
> > can
> > exist anywhere, not just in code which only root in a user namespace
> > can
> > exercise.  So from that point of view, ksplice seems a more complete
> > solution.  Why not just actually fix the bad code block when we know
> > about it?
> 
> That's not what I'm advocating. I only care about it for proactive
> attack surface reduction downstream. I have no interest in using it to
> block access to known vulnerabilities.
> 
> > Finally, it has been well argued that you can gain many new caps from
> > having only a few others.  Given that, how could you ever be sure
> > that,
> > if an 0day is found which allows root in a user ns to abuse
> > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
> > would suffice?
> 
> I didn't suggest using it that way...
> 
> >  It seems to me that the existing control in
> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct
> > tape
> > in that case.
> 
> There's no such thing as unprivileged_userns_clone in mainline.

Hm.  I was sure Kees had gotten that in...  I guess I was wrong.

> The advantage of this over unprivileged_userns_clone in Debian and maybe
> some other distributions is not giving up unprivileged app containers /
> sandboxes implemented via user namespaces.  For example, Chromium's user
> namespace sandbox likely only needs to have CAP_SYS_CHROOT. Chromium
> will be dropping their setuid sandbox, forcing usage of user namespaces
> to avoid losing the sandbox which will greatly increase local kernel
> attack surface on the host by exposing netfilter management, etc. to
> unprivileged users.
> 
> The proposed approach isn't necessarily the best way to implement this
> kind of mitigation but I think it's filling a real need.

I think I definately prefer what I mentioned in the email to Boris.
Basically a "permanent capability bounding set".  The normal bounding
set gets reset to a full set on every new user_ns creation.  In this
proposal, it would instead be set to the calling task's permanent
capability set, which starts (at boot) full, and which privileged
tasks can pull capabilities out of.

-serge

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Daniel Micay

On Mon, 2017-11-06 at 16:14 -0600, Serge E. Hallyn wrote:
> Quoting Daniel Micay (danielmi...@gmail.com):
> > Substantial added attack surface will never go away as a problem.
> > There
> > aren't a finite number of vulnerabilities to be found.
> 
> There's varying levels of usefulness and quality.  There is code which
> I
> want to be able to use in a container, and code which I can't ever see
> a
> reason for using there.  The latter, especially if it's also in a
> staging driver, would be nice to have a toggle to disable.
> 
> You're not advocating dropping the added attack surface, only adding a
> way of dealing with an 0day after the fact.  Privilege raising 0days
> can
> exist anywhere, not just in code which only root in a user namespace
> can
> exercise.  So from that point of view, ksplice seems a more complete
> solution.  Why not just actually fix the bad code block when we know
> about it?

That's not what I'm advocating. I only care about it for proactive
attack surface reduction downstream. I have no interest in using it to
block access to known vulnerabilities.

> Finally, it has been well argued that you can gain many new caps from
> having only a few others.  Given that, how could you ever be sure
> that,
> if an 0day is found which allows root in a user ns to abuse
> CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
> would suffice?

I didn't suggest using it that way...

>  It seems to me that the existing control in
> /proc/sys/kernel/unprivileged_userns_clone might be the better duct
> tape
> in that case.

There's no such thing as unprivileged_userns_clone in mainline.

The advantage of this over unprivileged_userns_clone in Debian and maybe
some other distributions is not giving up unprivileged app containers /
sandboxes implemented via user namespaces.  For example, Chromium's user
namespace sandbox likely only needs to have CAP_SYS_CHROOT. Chromium
will be dropping their setuid sandbox, forcing usage of user namespaces
to avoid losing the sandbox which will greatly increase local kernel
attack surface on the host by exposing netfilter management, etc. to
unprivileged users.

The proposed approach isn't necessarily the best way to implement this
kind of mitigation but I think it's filling a real need.

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Boris Lukashev

On Mon, Nov 6, 2017 at 6:39 PM, Serge E. Hallyn  wrote:
> Quoting Boris Lukashev (blukas...@sempervictus.com):
>> On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn  wrote:
>> > Quoting Daniel Micay (danielmi...@gmail.com):
>> >> Substantial added attack surface will never go away as a problem. There
>> >> aren't a finite number of vulnerabilities to be found.
>> >
>> > There's varying levels of usefulness and quality.  There is code which I
>> > want to be able to use in a container, and code which I can't ever see a
>> > reason for using there.  The latter, especially if it's also in a
>> > staging driver, would be nice to have a toggle to disable.
>> >
>> > You're not advocating dropping the added attack surface, only adding a
>> > way of dealing with an 0day after the fact.  Privilege raising 0days can
>> > exist anywhere, not just in code which only root in a user namespace can
>> > exercise.  So from that point of view, ksplice seems a more complete
>> > solution.  Why not just actually fix the bad code block when we know
>> > about it?
>> >
>> > Finally, it has been well argued that you can gain many new caps from
>> > having only a few others.  Given that, how could you ever be sure that,
>> > if an 0day is found which allows root in a user ns to abuse
>> > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
>> > would suffice?  It seems to me that the existing control in
>> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
>> > in that case.
>> >
>> > -serge
>>
>> This seems to be heading toward "we need full zones in Linux" with
>> their own procfs and sysfs namespace and a stricter isolation model
>> for resources and capabilities. So long as things can happen in a
>> namespace which have a privileged relationship with host resources,
>> this is going to be cat-and-mouse to one degree or another.
>>
>> Containers and namespaces dont have a one-to-one relationship, so i'm
>> not sure that's the best term to use in the kernel security context
>
> Sorry - what's not the best term to use?

Pardon, "containers," since they're namespaces+system construct.

>
>> since there's a bunch of userspace and implementation delta across the
>> different systems (with their own security models and so forth).
>> Without accounting for what a specific implementation may or may not
>> do, and only looking at "how do we reduce privileged impact on parent
>> context from unprivileged namespaces," this patch does seem to provide
>> a logical way of reducing the privileges available in such a namespace
>> and often needed to mount escapes/impact parent context.
>
> What different implementations do is irrelevant - as an unprivileged user
> I can always, with no help, create a new user namespace mapping my current
> uid to root, and exercise this code.  So the security model implemented
> by a particular userspace namespace-using driver doesn't matter, as it
> only restricts me if I choose to use it.
>
> But, I guess you're actually saying that some program might know that it
> should never use network code so want to drop CAP_NET_*?  And you're
> saying that a "global capability bounding set" might be useful?
>

The "global capability bounding set" with forced inheritance can be
used to prevent the vector you describe wherein the capability of UID
0 in the child NS is restricted from the parent implicitly, so yes,
that nomenclature seems appropriate.

> Would it be better to actually implement it as a new bounding set that
> is maintained across user namespace creations, but is per-task (inherted
> by children of course)?  Instead of a sysctl?
>
> -serge

In line with the previous comment, the inheritance across subsequent
invocations should be forced to prevent the context you described.
Please pardon my ignorance, not sure what you mean in terms of
"per-task" across namespace creation.

-Boris

-- 
Boris Lukashev
Systems Architect
Semper Victus

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Serge E. Hallyn

Quoting Boris Lukashev (blukas...@sempervictus.com):
> On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn  wrote:
> > Quoting Daniel Micay (danielmi...@gmail.com):
> >> Substantial added attack surface will never go away as a problem. There
> >> aren't a finite number of vulnerabilities to be found.
> >
> > There's varying levels of usefulness and quality.  There is code which I
> > want to be able to use in a container, and code which I can't ever see a
> > reason for using there.  The latter, especially if it's also in a
> > staging driver, would be nice to have a toggle to disable.
> >
> > You're not advocating dropping the added attack surface, only adding a
> > way of dealing with an 0day after the fact.  Privilege raising 0days can
> > exist anywhere, not just in code which only root in a user namespace can
> > exercise.  So from that point of view, ksplice seems a more complete
> > solution.  Why not just actually fix the bad code block when we know
> > about it?
> >
> > Finally, it has been well argued that you can gain many new caps from
> > having only a few others.  Given that, how could you ever be sure that,
> > if an 0day is found which allows root in a user ns to abuse
> > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
> > would suffice?  It seems to me that the existing control in
> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
> > in that case.
> >
> > -serge
> 
> This seems to be heading toward "we need full zones in Linux" with
> their own procfs and sysfs namespace and a stricter isolation model
> for resources and capabilities. So long as things can happen in a
> namespace which have a privileged relationship with host resources,
> this is going to be cat-and-mouse to one degree or another.
> 
> Containers and namespaces dont have a one-to-one relationship, so i'm
> not sure that's the best term to use in the kernel security context

Sorry - what's not the best term to use?

> since there's a bunch of userspace and implementation delta across the
> different systems (with their own security models and so forth).
> Without accounting for what a specific implementation may or may not
> do, and only looking at "how do we reduce privileged impact on parent
> context from unprivileged namespaces," this patch does seem to provide
> a logical way of reducing the privileges available in such a namespace
> and often needed to mount escapes/impact parent context.

What different implementations do is irrelevant - as an unprivileged user
I can always, with no help, create a new user namespace mapping my current
uid to root, and exercise this code.  So the security model implemented
by a particular userspace namespace-using driver doesn't matter, as it
only restricts me if I choose to use it.

But, I guess you're actually saying that some program might know that it
should never use network code so want to drop CAP_NET_*?  And you're
saying that a "global capability bounding set" might be useful?

Would it be better to actually implement it as a new bounding set that
is maintained across user namespace creations, but is per-task (inherted
by children of course)?  Instead of a sysctl?

-serge

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Boris Lukashev

On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn  wrote:
> Quoting Daniel Micay (danielmi...@gmail.com):
>> Substantial added attack surface will never go away as a problem. There
>> aren't a finite number of vulnerabilities to be found.
>
> There's varying levels of usefulness and quality.  There is code which I
> want to be able to use in a container, and code which I can't ever see a
> reason for using there.  The latter, especially if it's also in a
> staging driver, would be nice to have a toggle to disable.
>
> You're not advocating dropping the added attack surface, only adding a
> way of dealing with an 0day after the fact.  Privilege raising 0days can
> exist anywhere, not just in code which only root in a user namespace can
> exercise.  So from that point of view, ksplice seems a more complete
> solution.  Why not just actually fix the bad code block when we know
> about it?
>
> Finally, it has been well argued that you can gain many new caps from
> having only a few others.  Given that, how could you ever be sure that,
> if an 0day is found which allows root in a user ns to abuse
> CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
> would suffice?  It seems to me that the existing control in
> /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
> in that case.
>
> -serge

This seems to be heading toward "we need full zones in Linux" with
their own procfs and sysfs namespace and a stricter isolation model
for resources and capabilities. So long as things can happen in a
namespace which have a privileged relationship with host resources,
this is going to be cat-and-mouse to one degree or another.

Containers and namespaces dont have a one-to-one relationship, so i'm
not sure that's the best term to use in the kernel security context
since there's a bunch of userspace and implementation delta across the
different systems (with their own security models and so forth).
Without accounting for what a specific implementation may or may not
do, and only looking at "how do we reduce privileged impact on parent
context from unprivileged namespaces," this patch does seem to provide
a logical way of reducing the privileges available in such a namespace
and often needed to mount escapes/impact parent context.

-Boris

-- 
Boris Lukashev
Systems Architect
Semper Victus

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Christian Brauner

On Mon, Nov 06, 2017 at 04:14:18PM -0600, Serge Hallyn wrote:
> Quoting Daniel Micay (danielmi...@gmail.com):
> > Substantial added attack surface will never go away as a problem. There
> > aren't a finite number of vulnerabilities to be found.
> 
> There's varying levels of usefulness and quality.  There is code which I
> want to be able to use in a container, and code which I can't ever see a
> reason for using there.  The latter, especially if it's also in a
> staging driver, would be nice to have a toggle to disable.
> 
> You're not advocating dropping the added attack surface, only adding a
> way of dealing with an 0day after the fact.  Privilege raising 0days can
> exist anywhere, not just in code which only root in a user namespace can
> exercise.  So from that point of view, ksplice seems a more complete
> solution.  Why not just actually fix the bad code block when we know
> about it?
> 
> Finally, it has been well argued that you can gain many new caps from
> having only a few others.  Given that, how could you ever be sure that,
> if an 0day is found which allows root in a user ns to abuse
> CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
> would suffice?  It seems to me that the existing control in
> /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
> in that case.

I agree that /proc/sys/kernel/unprivileged_userns_clone is the most reasonable
thing to do. This patch introduces a layer of complexity to fine-tune user
namespace creation that - in the relevant security critical scenario - should
simply be turned of entirely.

Is /proc/sys/kernel/unprivileged_userns_clone upstreamed or is this still only
carried downstream?

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Serge E. Hallyn

Quoting Daniel Micay (danielmi...@gmail.com):
> Substantial added attack surface will never go away as a problem. There
> aren't a finite number of vulnerabilities to be found.

There's varying levels of usefulness and quality.  There is code which I
want to be able to use in a container, and code which I can't ever see a
reason for using there.  The latter, especially if it's also in a
staging driver, would be nice to have a toggle to disable.

You're not advocating dropping the added attack surface, only adding a
way of dealing with an 0day after the fact.  Privilege raising 0days can
exist anywhere, not just in code which only root in a user namespace can
exercise.  So from that point of view, ksplice seems a more complete
solution.  Why not just actually fix the bad code block when we know
about it?

Finally, it has been well argued that you can gain many new caps from
having only a few others.  Given that, how could you ever be sure that,
if an 0day is found which allows root in a user ns to abuse
CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
would suffice?  It seems to me that the existing control in
/proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
in that case.

-serge

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Daniel Micay

Substantial added attack surface will never go away as a problem. There
aren't a finite number of vulnerabilities to be found.

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

22 matches

Site Navigation

Mail list logo

Footer information