Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Fri, Nov 10, 2017 at 1:46 PM, Serge E. Hallyn wrote: > Quoting Eric W. Biederman (ebied...@xmission.com): >> single sandbox. I am not at all certain that the capabilities is the >> proper place to limit code reachability. > > Right, I keep having this gut feeling that there is another way we > should be doing that. Maybe based on ksplice or perf, or maybe more > based on subsystems. And I hope someone pursues that. But I can't put > my finger on it, and meanwhile the capability checks obviously *are* in > fact gates... > Well, I don't mind if there is a better solution available. The proposed solution is not adding too much or complex code and using a bit and a sysctl and will be sitting dormant. When we have complete solution, this addition should not be a burden to maintain because of it's non-invasive footprint. I will push the next version of the patch-set that implements Serge's finding. Thanks, --mahesh.. [PS: I'll be soon traveling again and moving to an area where connectivity will be scarce / unreliable. So please expect lot more delays in my responses.] > -serge
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Quoting Eric W. Biederman (ebied...@xmission.com): > single sandbox. I am not at all certain that the capabilities is the > proper place to limit code reachability. Right, I keep having this gut feeling that there is another way we should be doing that. Maybe based on ksplice or perf, or maybe more based on subsystems. And I hope someone pursues that. But I can't put my finger on it, and meanwhile the capability checks obviously *are* in fact gates... -serge
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Fri, Nov 10, 2017 at 6:58 AM, Eric W. Biederman wrote: > "Mahesh Bandewar (महेश बंडेवार)" writes: > >> [resend response as earlier one failed because of formatting issues] >> >> On Thu, Nov 9, 2017 at 12:21 PM, Serge E. Hallyn wrote: >>> >>> On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) >>> wrote: >>> > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner >>> > wrote: >>> > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश >>> > > बंडेवार) wrote: >>> > >> Sorry folks I was traveling and seems like lot happened on this >>> > >> thread. :p >>> > >> >>> > >> I will try to response few of these comments selectively - >>> > >> >>> > >> > The thing that makes me hesitate with this set is that it is a >>> > >> > permanent new feature to address what (I hope) is a temporary >>> > >> > problem. >>> > >> I agree this is permanent new feature but it's not solving a temporary >>> > >> problem. It's impossible to assess what and when new vulnerability >>> > >> that could show up. I think Daniel summed it up appropriately in his >>> > >> response >>> > >> >>> > >> > Seems like there are two naive ways to do it, the first being to just >>> > >> > look at all code under ns_capable() plus code called from there. It >>> > >> > seems like looking at the result of that could be fruitful. >>> > >> This is really hard. The main issue that there were features designed >>> > >> and developed before user-ns days with an assumption that unprivileged >>> > >> users will never get certain capabilities which only root user gets. >>> > >> Now that is not true anymore with user-ns creation with mapping root >>> > >> for any process. Also at the same time blocking user-ns creation for >>> > >> eveyone is a big-hammer which is not needed too. So it's not that easy >>> > >> to just perform a code-walk-though and correct those decisions now. >>> > >> >>> > >> > It seems to me that the existing control in >>> > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct >>> > >> > tape >>> > >> > in that case. >>> > >> This solution is essentially blocking unprivileged users from using >>> > >> the user-namespaces entirely. This is not really a solution that can >>> > >> work. The solution that this patch-set adds allows unprivileged users >>> > >> to create user-namespaces. Actually the proposed solution is more >>> > >> fine-grained approach than the unprivileged_userns_clone solution >>> > >> since you can selectively block capabilities rather than completely >>> > >> blocking the functionality. >>> > > >>> > > I've been talking to Stéphane today about this and we should also keep >>> > > in mind >>> > > that we have: >>> > > >>> > > chb@conventiont|~ >>> > >> ls -al /proc/sys/user/ >>> > > total 0 >>> > > dr-xr-xr-x 1 root root 0 Nov 6 23:32 . >>> > > dr-xr-xr-x 1 root root 0 Nov 2 22:13 .. >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_cgroup_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_instances >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_watches >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_ipc_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_mnt_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_net_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_pid_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_user_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_uts_namespaces >>> > > >>> > > These files allow you to limit the number of namespaces that can be >>> > > created >>> > > *per namespace* type. So let's say your system runs a bunch of user >>> > > namespaces >>> > > you can do: >>> > > >>> > > chb@conventiont|~ >>> > >> echo 0 > /proc/sys/user/max_user_namespaces >>> > > >>> > > So that the next time you try to create a user namespaces you'd see: >>> > > >>> > > chb@conventiont|~ >>> > >> unshare -U >>> > > unshare: unshare failed: No space left on device >>> > > >>> > > So there's not even a need to upstream a new sysctl since we have ways >>> > > of >>> > > blocking this. >>> > > >>> > I'm not sure how it's solving the problem that my patch-set is addressing? >>> > I agree though that the need for unprivileged_userns_clone sysctl goes >>> > away as this is equivalent to setting that sysctl to 0 as you have >>> > described above. >>> >>> oh right that was the reasoning iirc for not needing the other sysctl. >>> >>> > However as I mentioned earlier, blocking processes from creating >>> > user-namespaces is not the solution. Processes should be able to >>> > create namespaces as they are designed but at the same time we need to >>> > have controls to 'contain' them if a need arise. Setting max_no to 0 >>> > is not the solution that I'm looking for since it doesn't solve the >>> > problem. >>> >>> well yesterday we were told that was explicitly not the goal, but that was >>> not by you ... i just mention it to explain why we seem to be walking in >>> circles a bit.
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
"Mahesh Bandewar (महेश बंडेवार)" writes: > [resend response as earlier one failed because of formatting issues] > > On Thu, Nov 9, 2017 at 12:21 PM, Serge E. Hallyn wrote: >> >> On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) >> wrote: >> > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner >> > wrote: >> > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) >> > > wrote: >> > >> Sorry folks I was traveling and seems like lot happened on this thread. >> > >> :p >> > >> >> > >> I will try to response few of these comments selectively - >> > >> >> > >> > The thing that makes me hesitate with this set is that it is a >> > >> > permanent new feature to address what (I hope) is a temporary >> > >> > problem. >> > >> I agree this is permanent new feature but it's not solving a temporary >> > >> problem. It's impossible to assess what and when new vulnerability >> > >> that could show up. I think Daniel summed it up appropriately in his >> > >> response >> > >> >> > >> > Seems like there are two naive ways to do it, the first being to just >> > >> > look at all code under ns_capable() plus code called from there. It >> > >> > seems like looking at the result of that could be fruitful. >> > >> This is really hard. The main issue that there were features designed >> > >> and developed before user-ns days with an assumption that unprivileged >> > >> users will never get certain capabilities which only root user gets. >> > >> Now that is not true anymore with user-ns creation with mapping root >> > >> for any process. Also at the same time blocking user-ns creation for >> > >> eveyone is a big-hammer which is not needed too. So it's not that easy >> > >> to just perform a code-walk-though and correct those decisions now. >> > >> >> > >> > It seems to me that the existing control in >> > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct >> > >> > tape >> > >> > in that case. >> > >> This solution is essentially blocking unprivileged users from using >> > >> the user-namespaces entirely. This is not really a solution that can >> > >> work. The solution that this patch-set adds allows unprivileged users >> > >> to create user-namespaces. Actually the proposed solution is more >> > >> fine-grained approach than the unprivileged_userns_clone solution >> > >> since you can selectively block capabilities rather than completely >> > >> blocking the functionality. >> > > >> > > I've been talking to Stéphane today about this and we should also keep >> > > in mind >> > > that we have: >> > > >> > > chb@conventiont|~ >> > >> ls -al /proc/sys/user/ >> > > total 0 >> > > dr-xr-xr-x 1 root root 0 Nov 6 23:32 . >> > > dr-xr-xr-x 1 root root 0 Nov 2 22:13 .. >> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_cgroup_namespaces >> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_instances >> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_watches >> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_ipc_namespaces >> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_mnt_namespaces >> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_net_namespaces >> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_pid_namespaces >> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_user_namespaces >> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_uts_namespaces >> > > >> > > These files allow you to limit the number of namespaces that can be >> > > created >> > > *per namespace* type. So let's say your system runs a bunch of user >> > > namespaces >> > > you can do: >> > > >> > > chb@conventiont|~ >> > >> echo 0 > /proc/sys/user/max_user_namespaces >> > > >> > > So that the next time you try to create a user namespaces you'd see: >> > > >> > > chb@conventiont|~ >> > >> unshare -U >> > > unshare: unshare failed: No space left on device >> > > >> > > So there's not even a need to upstream a new sysctl since we have ways of >> > > blocking this. >> > > >> > I'm not sure how it's solving the problem that my patch-set is addressing? >> > I agree though that the need for unprivileged_userns_clone sysctl goes >> > away as this is equivalent to setting that sysctl to 0 as you have >> > described above. >> >> oh right that was the reasoning iirc for not needing the other sysctl. >> >> > However as I mentioned earlier, blocking processes from creating >> > user-namespaces is not the solution. Processes should be able to >> > create namespaces as they are designed but at the same time we need to >> > have controls to 'contain' them if a need arise. Setting max_no to 0 >> > is not the solution that I'm looking for since it doesn't solve the >> > problem. >> >> well yesterday we were told that was explicitly not the goal, but that was >> not by you ... i just mention it to explain why we seem to be walking in >> circles a bit. >> >> anyway the bounding set doesn't actually make sense so forget that. the >> question then is just whether it makes sense to allow things to continue >> at all in t
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On 11/09/2017 01:05 PM, Serge E. Hallyn wrote: Would the existing capability bounding set not suffice for that? The 'permanent' bounding set turns out to not be a good fit for the problem being discussed in this thread, but please feel free to start a new thread if you want to discuss your use case. Sure. I will formulate something for a new thread. What seems to be asked for here is a way to globally patch the capability sets of a entire process subtree. -chrish
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Quoting chris hyser (chris.hy...@oracle.com): > On 11/06/2017 10:23 PM, Serge E. Hallyn wrote: > >I think I definately prefer what I mentioned in the email to Boris. > >Basically a "permanent capability bounding set". The normal bounding > >set gets reset to a full set on every new user_ns creation. In this > >proposal, it would instead be set to the calling task's permanent > >capability set, which starts (at boot) full, and which privileged > >tasks can pull capabilities out of. > > Actually, this may solve a similar problem I've been looking at. The > idea was basically at strategic points in the kernel (possibly LSM > hook sites, still evaluating, and probably syscall entry) validate > that a task has not "magically" acquired capabilities that it or > parent specifically said it cannot have and then take some action > like say killing it immediately. Using your terms, basically make > the "permanent capability set" a write-once privilege escalation > defense. To handle the 0-day threat, perhaps make it writable but > only with more "restrictive" values. Would the existing capability bounding set not suffice for that? The 'permanent' bounding set turns out to not be a good fit for the problem being discussed in this thread, but please feel free to start a new thread if you want to discuss your use case.
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On 11/06/2017 10:23 PM, Serge E. Hallyn wrote: I think I definately prefer what I mentioned in the email to Boris. Basically a "permanent capability bounding set". The normal bounding set gets reset to a full set on every new user_ns creation. In this proposal, it would instead be set to the calling task's permanent capability set, which starts (at boot) full, and which privileged tasks can pull capabilities out of. Actually, this may solve a similar problem I've been looking at. The idea was basically at strategic points in the kernel (possibly LSM hook sites, still evaluating, and probably syscall entry) validate that a task has not "magically" acquired capabilities that it or parent specifically said it cannot have and then take some action like say killing it immediately. Using your terms, basically make the "permanent capability set" a write-once privilege escalation defense. To handle the 0-day threat, perhaps make it writable but only with more "restrictive" values. -chrish
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Quoting Mahesh Bandewar (महेश बंडेवार) (mahe...@google.com): > Of course. Let's take an example of the CVE that I have mentioned in > my cover-letter - > CVE-2017-7308(https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-7308). > It's well documented and even has a > exploit(https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308) > c-program that can demonstrate how it can be used against non-patched > kernel. There is very nice blog > post(https://googleprojectzero.blogspot.kr/2017/05/exploiting-linux-kernel-via-packet.html) > about this vulnerability by Andrey Konovalov. Ok, thanks. It's a good example because the fix for this CVE actually came by itself (http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/debian.master/changelog). Normally multiple CVEs come at the same time, which would make a workaround for one now helpful. This is a good counter-example. I'm going to maintain that I really don't like this. But it looks useful, so ack on the concept, I'll just have to look again at the code now. Thanks for indulging me. -serge
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
[resend response as earlier one failed because of formatting issues] On Thu, Nov 9, 2017 at 12:21 PM, Serge E. Hallyn wrote: > > On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) > wrote: > > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner > > wrote: > > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) > > > wrote: > > >> Sorry folks I was traveling and seems like lot happened on this thread. > > >> :p > > >> > > >> I will try to response few of these comments selectively - > > >> > > >> > The thing that makes me hesitate with this set is that it is a > > >> > permanent new feature to address what (I hope) is a temporary > > >> > problem. > > >> I agree this is permanent new feature but it's not solving a temporary > > >> problem. It's impossible to assess what and when new vulnerability > > >> that could show up. I think Daniel summed it up appropriately in his > > >> response > > >> > > >> > Seems like there are two naive ways to do it, the first being to just > > >> > look at all code under ns_capable() plus code called from there. It > > >> > seems like looking at the result of that could be fruitful. > > >> This is really hard. The main issue that there were features designed > > >> and developed before user-ns days with an assumption that unprivileged > > >> users will never get certain capabilities which only root user gets. > > >> Now that is not true anymore with user-ns creation with mapping root > > >> for any process. Also at the same time blocking user-ns creation for > > >> eveyone is a big-hammer which is not needed too. So it's not that easy > > >> to just perform a code-walk-though and correct those decisions now. > > >> > > >> > It seems to me that the existing control in > > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct > > >> > tape > > >> > in that case. > > >> This solution is essentially blocking unprivileged users from using > > >> the user-namespaces entirely. This is not really a solution that can > > >> work. The solution that this patch-set adds allows unprivileged users > > >> to create user-namespaces. Actually the proposed solution is more > > >> fine-grained approach than the unprivileged_userns_clone solution > > >> since you can selectively block capabilities rather than completely > > >> blocking the functionality. > > > > > > I've been talking to Stéphane today about this and we should also keep in > > > mind > > > that we have: > > > > > > chb@conventiont|~ > > >> ls -al /proc/sys/user/ > > > total 0 > > > dr-xr-xr-x 1 root root 0 Nov 6 23:32 . > > > dr-xr-xr-x 1 root root 0 Nov 2 22:13 .. > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_cgroup_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_instances > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_watches > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_ipc_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_mnt_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_net_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_pid_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_user_namespaces > > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_uts_namespaces > > > > > > These files allow you to limit the number of namespaces that can be > > > created > > > *per namespace* type. So let's say your system runs a bunch of user > > > namespaces > > > you can do: > > > > > > chb@conventiont|~ > > >> echo 0 > /proc/sys/user/max_user_namespaces > > > > > > So that the next time you try to create a user namespaces you'd see: > > > > > > chb@conventiont|~ > > >> unshare -U > > > unshare: unshare failed: No space left on device > > > > > > So there's not even a need to upstream a new sysctl since we have ways of > > > blocking this. > > > > > I'm not sure how it's solving the problem that my patch-set is addressing? > > I agree though that the need for unprivileged_userns_clone sysctl goes > > away as this is equivalent to setting that sysctl to 0 as you have > > described above. > > oh right that was the reasoning iirc for not needing the other sysctl. > > > However as I mentioned earlier, blocking processes from creating > > user-namespaces is not the solution. Processes should be able to > > create namespaces as they are designed but at the same time we need to > > have controls to 'contain' them if a need arise. Setting max_no to 0 > > is not the solution that I'm looking for since it doesn't solve the > > problem. > > well yesterday we were told that was explicitly not the goal, but that was > not by you ... i just mention it to explain why we seem to be walking in > circles a bit. > > anyway the bounding set doesn't actually make sense so forget that. the > question then is just whether it makes sense to allow things to continue > at all in this situation. would you mind indulging me by giving one or two > concrete examples in the previous known cves of what capabilities you would > have
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) wrote: > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner > wrote: > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) > > wrote: > >> Sorry folks I was traveling and seems like lot happened on this thread. :p > >> > >> I will try to response few of these comments selectively - > >> > >> > The thing that makes me hesitate with this set is that it is a > >> > permanent new feature to address what (I hope) is a temporary > >> > problem. > >> I agree this is permanent new feature but it's not solving a temporary > >> problem. It's impossible to assess what and when new vulnerability > >> that could show up. I think Daniel summed it up appropriately in his > >> response > >> > >> > Seems like there are two naive ways to do it, the first being to just > >> > look at all code under ns_capable() plus code called from there. It > >> > seems like looking at the result of that could be fruitful. > >> This is really hard. The main issue that there were features designed > >> and developed before user-ns days with an assumption that unprivileged > >> users will never get certain capabilities which only root user gets. > >> Now that is not true anymore with user-ns creation with mapping root > >> for any process. Also at the same time blocking user-ns creation for > >> eveyone is a big-hammer which is not needed too. So it's not that easy > >> to just perform a code-walk-though and correct those decisions now. > >> > >> > It seems to me that the existing control in > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape > >> > in that case. > >> This solution is essentially blocking unprivileged users from using > >> the user-namespaces entirely. This is not really a solution that can > >> work. The solution that this patch-set adds allows unprivileged users > >> to create user-namespaces. Actually the proposed solution is more > >> fine-grained approach than the unprivileged_userns_clone solution > >> since you can selectively block capabilities rather than completely > >> blocking the functionality. > > > > I've been talking to Stéphane today about this and we should also keep in > > mind > > that we have: > > > > chb@conventiont|~ > >> ls -al /proc/sys/user/ > > total 0 > > dr-xr-xr-x 1 root root 0 Nov 6 23:32 . > > dr-xr-xr-x 1 root root 0 Nov 2 22:13 .. > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_cgroup_namespaces > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_instances > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_watches > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_ipc_namespaces > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_mnt_namespaces > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_net_namespaces > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_pid_namespaces > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_user_namespaces > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_uts_namespaces > > > > These files allow you to limit the number of namespaces that can be created > > *per namespace* type. So let's say your system runs a bunch of user > > namespaces > > you can do: > > > > chb@conventiont|~ > >> echo 0 > /proc/sys/user/max_user_namespaces > > > > So that the next time you try to create a user namespaces you'd see: > > > > chb@conventiont|~ > >> unshare -U > > unshare: unshare failed: No space left on device > > > > So there's not even a need to upstream a new sysctl since we have ways of > > blocking this. > > > I'm not sure how it's solving the problem that my patch-set is addressing? > I agree though that the need for unprivileged_userns_clone sysctl goes > away as this is equivalent to setting that sysctl to 0 as you have > described above. oh right that was the reasoning iirc for not needing the other sysctl. > However as I mentioned earlier, blocking processes from creating > user-namespaces is not the solution. Processes should be able to > create namespaces as they are designed but at the same time we need to > have controls to 'contain' them if a need arise. Setting max_no to 0 > is not the solution that I'm looking for since it doesn't solve the > problem. well yesterday we were told that was explicitly not the goal, but that was not by you ... i just mention it to explain why we seem to be walking in circles a bit. anyway the bounding set doesn't actually make sense so forget that. the question then is just whether it makes sense to allow things to continue at all in this situation. would you mind indulging me by giving one or two concrete examples in the previous known cves of what capabilities you would have dropped tto allow the rest to continue to be safely used? thanks, serge
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner wrote: > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) > wrote: >> Sorry folks I was traveling and seems like lot happened on this thread. :p >> >> I will try to response few of these comments selectively - >> >> > The thing that makes me hesitate with this set is that it is a >> > permanent new feature to address what (I hope) is a temporary >> > problem. >> I agree this is permanent new feature but it's not solving a temporary >> problem. It's impossible to assess what and when new vulnerability >> that could show up. I think Daniel summed it up appropriately in his >> response >> >> > Seems like there are two naive ways to do it, the first being to just >> > look at all code under ns_capable() plus code called from there. It >> > seems like looking at the result of that could be fruitful. >> This is really hard. The main issue that there were features designed >> and developed before user-ns days with an assumption that unprivileged >> users will never get certain capabilities which only root user gets. >> Now that is not true anymore with user-ns creation with mapping root >> for any process. Also at the same time blocking user-ns creation for >> eveyone is a big-hammer which is not needed too. So it's not that easy >> to just perform a code-walk-though and correct those decisions now. >> >> > It seems to me that the existing control in >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape >> > in that case. >> This solution is essentially blocking unprivileged users from using >> the user-namespaces entirely. This is not really a solution that can >> work. The solution that this patch-set adds allows unprivileged users >> to create user-namespaces. Actually the proposed solution is more >> fine-grained approach than the unprivileged_userns_clone solution >> since you can selectively block capabilities rather than completely >> blocking the functionality. > > I've been talking to Stéphane today about this and we should also keep in mind > that we have: > > chb@conventiont|~ >> ls -al /proc/sys/user/ > total 0 > dr-xr-xr-x 1 root root 0 Nov 6 23:32 . > dr-xr-xr-x 1 root root 0 Nov 2 22:13 .. > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_cgroup_namespaces > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_instances > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_watches > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_ipc_namespaces > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_mnt_namespaces > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_net_namespaces > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_pid_namespaces > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_user_namespaces > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_uts_namespaces > > These files allow you to limit the number of namespaces that can be created > *per namespace* type. So let's say your system runs a bunch of user namespaces > you can do: > > chb@conventiont|~ >> echo 0 > /proc/sys/user/max_user_namespaces > > So that the next time you try to create a user namespaces you'd see: > > chb@conventiont|~ >> unshare -U > unshare: unshare failed: No space left on device > > So there's not even a need to upstream a new sysctl since we have ways of > blocking this. > I'm not sure how it's solving the problem that my patch-set is addressing? I agree though that the need for unprivileged_userns_clone sysctl goes away as this is equivalent to setting that sysctl to 0 as you have described above. However as I mentioned earlier, blocking processes from creating user-namespaces is not the solution. Processes should be able to create namespaces as they are designed but at the same time we need to have controls to 'contain' them if a need arise. Setting max_no to 0 is not the solution that I'm looking for since it doesn't solve the problem. > Also I'd like to point out that a lot of capability checks and actual security > vulnerabilities are associated with CAP_SYS_ADMIN. So what you likely want to > do > is block CAP_SYS_ADMIN in user namespaces but at this point they become > basically useless for a lot of interesting use cases. In addition, this patch > would add another layer of complexity that is - imho - not really warranted > given what we already have. I disagree. I'm not sure how this patch is adding complexity? Simply the functionality is maintained exactly as it is with an extra knob which allows you to take control back if a situation arise. Once the kernel is patched for whatever was discovered, life returns back to normal by readjusting the knob. It's as simple as that! > The relationship between capabilities and user > namespaces should stay as simply as possible so that it stays maintaineable. > User namespaces already introduce a proper layer of complexity. > Just my two cents. I might be totally off here of course. The side effect of the solution is that you have sort-of scaled-down / broken functionality without blocking the feature complete
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) wrote: > Sorry folks I was traveling and seems like lot happened on this thread. :p > > I will try to response few of these comments selectively - > > > The thing that makes me hesitate with this set is that it is a > > permanent new feature to address what (I hope) is a temporary > > problem. > I agree this is permanent new feature but it's not solving a temporary > problem. It's impossible to assess what and when new vulnerability > that could show up. I think Daniel summed it up appropriately in his > response > > > Seems like there are two naive ways to do it, the first being to just > > look at all code under ns_capable() plus code called from there. It > > seems like looking at the result of that could be fruitful. > This is really hard. The main issue that there were features designed > and developed before user-ns days with an assumption that unprivileged > users will never get certain capabilities which only root user gets. > Now that is not true anymore with user-ns creation with mapping root > for any process. Also at the same time blocking user-ns creation for > eveyone is a big-hammer which is not needed too. So it's not that easy > to just perform a code-walk-though and correct those decisions now. > > > It seems to me that the existing control in > > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape > > in that case. > This solution is essentially blocking unprivileged users from using > the user-namespaces entirely. This is not really a solution that can > work. The solution that this patch-set adds allows unprivileged users > to create user-namespaces. Actually the proposed solution is more > fine-grained approach than the unprivileged_userns_clone solution > since you can selectively block capabilities rather than completely > blocking the functionality. I've been talking to Stéphane today about this and we should also keep in mind that we have: chb@conventiont|~ > ls -al /proc/sys/user/ total 0 dr-xr-xr-x 1 root root 0 Nov 6 23:32 . dr-xr-xr-x 1 root root 0 Nov 2 22:13 .. -rw-r--r-- 1 root root 0 Nov 8 19:48 max_cgroup_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_instances -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_watches -rw-r--r-- 1 root root 0 Nov 8 19:48 max_ipc_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_mnt_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_net_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_pid_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_user_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_uts_namespaces These files allow you to limit the number of namespaces that can be created *per namespace* type. So let's say your system runs a bunch of user namespaces you can do: chb@conventiont|~ > echo 0 > /proc/sys/user/max_user_namespaces So that the next time you try to create a user namespaces you'd see: chb@conventiont|~ > unshare -U unshare: unshare failed: No space left on device So there's not even a need to upstream a new sysctl since we have ways of blocking this. Also I'd like to point out that a lot of capability checks and actual security vulnerabilities are associated with CAP_SYS_ADMIN. So what you likely want to do is block CAP_SYS_ADMIN in user namespaces but at this point they become basically useless for a lot of interesting use cases. In addition, this patch would add another layer of complexity that is - imho - not really warranted given what we already have. The relationship between capabilities and user namespaces should stay as simply as possible so that it stays maintaineable. User namespaces already introduce a proper layer of complexity. Just my two cents. I might be totally off here of course. Christian
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Sorry folks I was traveling and seems like lot happened on this thread. :p I will try to response few of these comments selectively - > The thing that makes me hesitate with this set is that it is a > permanent new feature to address what (I hope) is a temporary > problem. I agree this is permanent new feature but it's not solving a temporary problem. It's impossible to assess what and when new vulnerability that could show up. I think Daniel summed it up appropriately in his response > Seems like there are two naive ways to do it, the first being to just > look at all code under ns_capable() plus code called from there. It > seems like looking at the result of that could be fruitful. This is really hard. The main issue that there were features designed and developed before user-ns days with an assumption that unprivileged users will never get certain capabilities which only root user gets. Now that is not true anymore with user-ns creation with mapping root for any process. Also at the same time blocking user-ns creation for eveyone is a big-hammer which is not needed too. So it's not that easy to just perform a code-walk-though and correct those decisions now. > It seems to me that the existing control in > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape > in that case. This solution is essentially blocking unprivileged users from using the user-namespaces entirely. This is not really a solution that can work. The solution that this patch-set adds allows unprivileged users to create user-namespaces. Actually the proposed solution is more fine-grained approach than the unprivileged_userns_clone solution since you can selectively block capabilities rather than completely blocking the functionality. > I meant each task has a perm_cap_bset next to the cap_bset. So task > p1 (if it has privilege) can drop CAP_SYS_ADMIN from perm_cap_bset, > p2 (if it has privilege) can drop CAP_NET_ADMIN. When p1 creates a > new user_ns, that init task has its cap_bset set to all caps but > CAP_SYS_ADMIN. > > I think for simplicity perm_cap_bset would *only* affect the filling > of cap_bset at user namespace creation. So if you wanted to drop a > capability from your own cap_bset as well, you'd have to do that > separately. My original intention is to reduce the attack surface when vulnerabilities are discovered / published, but I don't see how this is solving that issue. Also the reason to have sysctl is to have simplistic control across the board to contain the situation. If that is not addressed then we might need some other solution on top of this.
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Mon, Nov 06, 2017 at 07:01:58PM -0500, Boris Lukashev wrote: > On Mon, Nov 6, 2017 at 6:39 PM, Serge E. Hallyn wrote: > > Quoting Boris Lukashev (blukas...@sempervictus.com): > >> On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn wrote: > >> > Quoting Daniel Micay (danielmi...@gmail.com): > >> >> Substantial added attack surface will never go away as a problem. There > >> >> aren't a finite number of vulnerabilities to be found. > >> > > >> > There's varying levels of usefulness and quality. There is code which I > >> > want to be able to use in a container, and code which I can't ever see a > >> > reason for using there. The latter, especially if it's also in a > >> > staging driver, would be nice to have a toggle to disable. > >> > > >> > You're not advocating dropping the added attack surface, only adding a > >> > way of dealing with an 0day after the fact. Privilege raising 0days can > >> > exist anywhere, not just in code which only root in a user namespace can > >> > exercise. So from that point of view, ksplice seems a more complete > >> > solution. Why not just actually fix the bad code block when we know > >> > about it? > >> > > >> > Finally, it has been well argued that you can gain many new caps from > >> > having only a few others. Given that, how could you ever be sure that, > >> > if an 0day is found which allows root in a user ns to abuse > >> > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them > >> > would suffice? It seems to me that the existing control in > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape > >> > in that case. > >> > > >> > -serge > >> > >> This seems to be heading toward "we need full zones in Linux" with > >> their own procfs and sysfs namespace and a stricter isolation model > >> for resources and capabilities. So long as things can happen in a > >> namespace which have a privileged relationship with host resources, > >> this is going to be cat-and-mouse to one degree or another. > >> > >> Containers and namespaces dont have a one-to-one relationship, so i'm > >> not sure that's the best term to use in the kernel security context > > > > Sorry - what's not the best term to use? > > Pardon, "containers," since they're namespaces+system construct. > > > > >> since there's a bunch of userspace and implementation delta across the > >> different systems (with their own security models and so forth). > >> Without accounting for what a specific implementation may or may not > >> do, and only looking at "how do we reduce privileged impact on parent > >> context from unprivileged namespaces," this patch does seem to provide > >> a logical way of reducing the privileges available in such a namespace > >> and often needed to mount escapes/impact parent context. > > > > What different implementations do is irrelevant - as an unprivileged user > > I can always, with no help, create a new user namespace mapping my current > > uid to root, and exercise this code. So the security model implemented > > by a particular userspace namespace-using driver doesn't matter, as it > > only restricts me if I choose to use it. > > > > But, I guess you're actually saying that some program might know that it > > should never use network code so want to drop CAP_NET_*? And you're > > saying that a "global capability bounding set" might be useful? > > > > The "global capability bounding set" with forced inheritance can be > used to prevent the vector you describe wherein the capability of UID > 0 in the child NS is restricted from the parent implicitly, so yes, > that nomenclature seems appropriate. > > > Would it be better to actually implement it as a new bounding set that > > is maintained across user namespace creations, but is per-task (inherted > > by children of course)? Instead of a sysctl? > > > > -serge > > In line with the previous comment, the inheritance across subsequent > invocations should be forced to prevent the context you described. > Please pardon my ignorance, not sure what you mean in terms of > "per-task" across namespace creation. I meant each task has a perm_cap_bset next to the cap_bset. So task p1 (if it has privilege) can drop CAP_SYS_ADMIN from perm_cap_bset, p2 (if it has privilege) can drop CAP_NET_ADMIN. When p1 creates a new user_ns, that init task has its cap_bset set to all caps but CAP_SYS_ADMIN. I think for simplicity perm_cap_bset would *only* affect the filling of cap_bset at user namespace creation. So if you wanted to drop a capability from your own cap_bset as well, you'd have to do that separately.
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Mon, Nov 06, 2017 at 09:16:03PM -0500, Daniel Micay wrote: > On Mon, 2017-11-06 at 16:14 -0600, Serge E. Hallyn wrote: > > Quoting Daniel Micay (danielmi...@gmail.com): > > > Substantial added attack surface will never go away as a problem. > > > There > > > aren't a finite number of vulnerabilities to be found. > > > > There's varying levels of usefulness and quality. There is code which > > I > > want to be able to use in a container, and code which I can't ever see > > a > > reason for using there. The latter, especially if it's also in a > > staging driver, would be nice to have a toggle to disable. > > > > You're not advocating dropping the added attack surface, only adding a > > way of dealing with an 0day after the fact. Privilege raising 0days > > can > > exist anywhere, not just in code which only root in a user namespace > > can > > exercise. So from that point of view, ksplice seems a more complete > > solution. Why not just actually fix the bad code block when we know > > about it? > > That's not what I'm advocating. I only care about it for proactive > attack surface reduction downstream. I have no interest in using it to > block access to known vulnerabilities. > > > Finally, it has been well argued that you can gain many new caps from > > having only a few others. Given that, how could you ever be sure > > that, > > if an 0day is found which allows root in a user ns to abuse > > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them > > would suffice? > > I didn't suggest using it that way... > > > It seems to me that the existing control in > > /proc/sys/kernel/unprivileged_userns_clone might be the better duct > > tape > > in that case. > > There's no such thing as unprivileged_userns_clone in mainline. Hm. I was sure Kees had gotten that in... I guess I was wrong. > The advantage of this over unprivileged_userns_clone in Debian and maybe > some other distributions is not giving up unprivileged app containers / > sandboxes implemented via user namespaces. For example, Chromium's user > namespace sandbox likely only needs to have CAP_SYS_CHROOT. Chromium > will be dropping their setuid sandbox, forcing usage of user namespaces > to avoid losing the sandbox which will greatly increase local kernel > attack surface on the host by exposing netfilter management, etc. to > unprivileged users. > > The proposed approach isn't necessarily the best way to implement this > kind of mitigation but I think it's filling a real need. I think I definately prefer what I mentioned in the email to Boris. Basically a "permanent capability bounding set". The normal bounding set gets reset to a full set on every new user_ns creation. In this proposal, it would instead be set to the calling task's permanent capability set, which starts (at boot) full, and which privileged tasks can pull capabilities out of. -serge
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Mon, 2017-11-06 at 16:14 -0600, Serge E. Hallyn wrote: > Quoting Daniel Micay (danielmi...@gmail.com): > > Substantial added attack surface will never go away as a problem. > > There > > aren't a finite number of vulnerabilities to be found. > > There's varying levels of usefulness and quality. There is code which > I > want to be able to use in a container, and code which I can't ever see > a > reason for using there. The latter, especially if it's also in a > staging driver, would be nice to have a toggle to disable. > > You're not advocating dropping the added attack surface, only adding a > way of dealing with an 0day after the fact. Privilege raising 0days > can > exist anywhere, not just in code which only root in a user namespace > can > exercise. So from that point of view, ksplice seems a more complete > solution. Why not just actually fix the bad code block when we know > about it? That's not what I'm advocating. I only care about it for proactive attack surface reduction downstream. I have no interest in using it to block access to known vulnerabilities. > Finally, it has been well argued that you can gain many new caps from > having only a few others. Given that, how could you ever be sure > that, > if an 0day is found which allows root in a user ns to abuse > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them > would suffice? I didn't suggest using it that way... > It seems to me that the existing control in > /proc/sys/kernel/unprivileged_userns_clone might be the better duct > tape > in that case. There's no such thing as unprivileged_userns_clone in mainline. The advantage of this over unprivileged_userns_clone in Debian and maybe some other distributions is not giving up unprivileged app containers / sandboxes implemented via user namespaces. For example, Chromium's user namespace sandbox likely only needs to have CAP_SYS_CHROOT. Chromium will be dropping their setuid sandbox, forcing usage of user namespaces to avoid losing the sandbox which will greatly increase local kernel attack surface on the host by exposing netfilter management, etc. to unprivileged users. The proposed approach isn't necessarily the best way to implement this kind of mitigation but I think it's filling a real need.
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Mon, Nov 6, 2017 at 6:39 PM, Serge E. Hallyn wrote: > Quoting Boris Lukashev (blukas...@sempervictus.com): >> On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn wrote: >> > Quoting Daniel Micay (danielmi...@gmail.com): >> >> Substantial added attack surface will never go away as a problem. There >> >> aren't a finite number of vulnerabilities to be found. >> > >> > There's varying levels of usefulness and quality. There is code which I >> > want to be able to use in a container, and code which I can't ever see a >> > reason for using there. The latter, especially if it's also in a >> > staging driver, would be nice to have a toggle to disable. >> > >> > You're not advocating dropping the added attack surface, only adding a >> > way of dealing with an 0day after the fact. Privilege raising 0days can >> > exist anywhere, not just in code which only root in a user namespace can >> > exercise. So from that point of view, ksplice seems a more complete >> > solution. Why not just actually fix the bad code block when we know >> > about it? >> > >> > Finally, it has been well argued that you can gain many new caps from >> > having only a few others. Given that, how could you ever be sure that, >> > if an 0day is found which allows root in a user ns to abuse >> > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them >> > would suffice? It seems to me that the existing control in >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape >> > in that case. >> > >> > -serge >> >> This seems to be heading toward "we need full zones in Linux" with >> their own procfs and sysfs namespace and a stricter isolation model >> for resources and capabilities. So long as things can happen in a >> namespace which have a privileged relationship with host resources, >> this is going to be cat-and-mouse to one degree or another. >> >> Containers and namespaces dont have a one-to-one relationship, so i'm >> not sure that's the best term to use in the kernel security context > > Sorry - what's not the best term to use? Pardon, "containers," since they're namespaces+system construct. > >> since there's a bunch of userspace and implementation delta across the >> different systems (with their own security models and so forth). >> Without accounting for what a specific implementation may or may not >> do, and only looking at "how do we reduce privileged impact on parent >> context from unprivileged namespaces," this patch does seem to provide >> a logical way of reducing the privileges available in such a namespace >> and often needed to mount escapes/impact parent context. > > What different implementations do is irrelevant - as an unprivileged user > I can always, with no help, create a new user namespace mapping my current > uid to root, and exercise this code. So the security model implemented > by a particular userspace namespace-using driver doesn't matter, as it > only restricts me if I choose to use it. > > But, I guess you're actually saying that some program might know that it > should never use network code so want to drop CAP_NET_*? And you're > saying that a "global capability bounding set" might be useful? > The "global capability bounding set" with forced inheritance can be used to prevent the vector you describe wherein the capability of UID 0 in the child NS is restricted from the parent implicitly, so yes, that nomenclature seems appropriate. > Would it be better to actually implement it as a new bounding set that > is maintained across user namespace creations, but is per-task (inherted > by children of course)? Instead of a sysctl? > > -serge In line with the previous comment, the inheritance across subsequent invocations should be forced to prevent the context you described. Please pardon my ignorance, not sure what you mean in terms of "per-task" across namespace creation. -Boris -- Boris Lukashev Systems Architect Semper Victus
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Quoting Boris Lukashev (blukas...@sempervictus.com): > On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn wrote: > > Quoting Daniel Micay (danielmi...@gmail.com): > >> Substantial added attack surface will never go away as a problem. There > >> aren't a finite number of vulnerabilities to be found. > > > > There's varying levels of usefulness and quality. There is code which I > > want to be able to use in a container, and code which I can't ever see a > > reason for using there. The latter, especially if it's also in a > > staging driver, would be nice to have a toggle to disable. > > > > You're not advocating dropping the added attack surface, only adding a > > way of dealing with an 0day after the fact. Privilege raising 0days can > > exist anywhere, not just in code which only root in a user namespace can > > exercise. So from that point of view, ksplice seems a more complete > > solution. Why not just actually fix the bad code block when we know > > about it? > > > > Finally, it has been well argued that you can gain many new caps from > > having only a few others. Given that, how could you ever be sure that, > > if an 0day is found which allows root in a user ns to abuse > > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them > > would suffice? It seems to me that the existing control in > > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape > > in that case. > > > > -serge > > This seems to be heading toward "we need full zones in Linux" with > their own procfs and sysfs namespace and a stricter isolation model > for resources and capabilities. So long as things can happen in a > namespace which have a privileged relationship with host resources, > this is going to be cat-and-mouse to one degree or another. > > Containers and namespaces dont have a one-to-one relationship, so i'm > not sure that's the best term to use in the kernel security context Sorry - what's not the best term to use? > since there's a bunch of userspace and implementation delta across the > different systems (with their own security models and so forth). > Without accounting for what a specific implementation may or may not > do, and only looking at "how do we reduce privileged impact on parent > context from unprivileged namespaces," this patch does seem to provide > a logical way of reducing the privileges available in such a namespace > and often needed to mount escapes/impact parent context. What different implementations do is irrelevant - as an unprivileged user I can always, with no help, create a new user namespace mapping my current uid to root, and exercise this code. So the security model implemented by a particular userspace namespace-using driver doesn't matter, as it only restricts me if I choose to use it. But, I guess you're actually saying that some program might know that it should never use network code so want to drop CAP_NET_*? And you're saying that a "global capability bounding set" might be useful? Would it be better to actually implement it as a new bounding set that is maintained across user namespace creations, but is per-task (inherted by children of course)? Instead of a sysctl? -serge
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn wrote: > Quoting Daniel Micay (danielmi...@gmail.com): >> Substantial added attack surface will never go away as a problem. There >> aren't a finite number of vulnerabilities to be found. > > There's varying levels of usefulness and quality. There is code which I > want to be able to use in a container, and code which I can't ever see a > reason for using there. The latter, especially if it's also in a > staging driver, would be nice to have a toggle to disable. > > You're not advocating dropping the added attack surface, only adding a > way of dealing with an 0day after the fact. Privilege raising 0days can > exist anywhere, not just in code which only root in a user namespace can > exercise. So from that point of view, ksplice seems a more complete > solution. Why not just actually fix the bad code block when we know > about it? > > Finally, it has been well argued that you can gain many new caps from > having only a few others. Given that, how could you ever be sure that, > if an 0day is found which allows root in a user ns to abuse > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them > would suffice? It seems to me that the existing control in > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape > in that case. > > -serge This seems to be heading toward "we need full zones in Linux" with their own procfs and sysfs namespace and a stricter isolation model for resources and capabilities. So long as things can happen in a namespace which have a privileged relationship with host resources, this is going to be cat-and-mouse to one degree or another. Containers and namespaces dont have a one-to-one relationship, so i'm not sure that's the best term to use in the kernel security context since there's a bunch of userspace and implementation delta across the different systems (with their own security models and so forth). Without accounting for what a specific implementation may or may not do, and only looking at "how do we reduce privileged impact on parent context from unprivileged namespaces," this patch does seem to provide a logical way of reducing the privileges available in such a namespace and often needed to mount escapes/impact parent context. -Boris -- Boris Lukashev Systems Architect Semper Victus
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Mon, Nov 06, 2017 at 04:14:18PM -0600, Serge Hallyn wrote: > Quoting Daniel Micay (danielmi...@gmail.com): > > Substantial added attack surface will never go away as a problem. There > > aren't a finite number of vulnerabilities to be found. > > There's varying levels of usefulness and quality. There is code which I > want to be able to use in a container, and code which I can't ever see a > reason for using there. The latter, especially if it's also in a > staging driver, would be nice to have a toggle to disable. > > You're not advocating dropping the added attack surface, only adding a > way of dealing with an 0day after the fact. Privilege raising 0days can > exist anywhere, not just in code which only root in a user namespace can > exercise. So from that point of view, ksplice seems a more complete > solution. Why not just actually fix the bad code block when we know > about it? > > Finally, it has been well argued that you can gain many new caps from > having only a few others. Given that, how could you ever be sure that, > if an 0day is found which allows root in a user ns to abuse > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them > would suffice? It seems to me that the existing control in > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape > in that case. I agree that /proc/sys/kernel/unprivileged_userns_clone is the most reasonable thing to do. This patch introduces a layer of complexity to fine-tune user namespace creation that - in the relevant security critical scenario - should simply be turned of entirely. Is /proc/sys/kernel/unprivileged_userns_clone upstreamed or is this still only carried downstream?
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Quoting Daniel Micay (danielmi...@gmail.com): > Substantial added attack surface will never go away as a problem. There > aren't a finite number of vulnerabilities to be found. There's varying levels of usefulness and quality. There is code which I want to be able to use in a container, and code which I can't ever see a reason for using there. The latter, especially if it's also in a staging driver, would be nice to have a toggle to disable. You're not advocating dropping the added attack surface, only adding a way of dealing with an 0day after the fact. Privilege raising 0days can exist anywhere, not just in code which only root in a user namespace can exercise. So from that point of view, ksplice seems a more complete solution. Why not just actually fix the bad code block when we know about it? Finally, it has been well argued that you can gain many new caps from having only a few others. Given that, how could you ever be sure that, if an 0day is found which allows root in a user ns to abuse CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them would suffice? It seems to me that the existing control in /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape in that case. -serge
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Substantial added attack surface will never go away as a problem. There aren't a finite number of vulnerabilities to be found.