[PATCH] userns/capability: Add user namespace capability
Add capability CAP_SYS_USER_NS. Tasks having CAP_SYS_USER_NS are allowed to create a new user namespace when calling clone or unshare with CLONE_NEWUSER. Rationale: Linux 3.8 saw the introduction of unpriviledged user namespaces, allowing unpriviledged users (without CAP_SYS_ADMIN) to be a "fake" root inside a separate user namespace. Before that, any namespace creation required CAP_SYS_ADMIN (or, in practice, the user had to be root). Unfortunately, there have been some security-relevant bugs in the meantime. Because of the fairly complex nature of user namespaces, it is reasonable to say that future vulnerabilties can not be excluded. Some distributions even wholly disable user namespaces because of this. Both options, user namespaces with and without CAP_SYS_ADMIN, can be said to represent the extreme end of the spectrum. In practice, there is no reason for every process to have the abilitiy to create user namespaces. Indeed, only very few and specialized programs require user namespaces. This seems to be a perfect fit for the (file) capability system: Priviledged users could manually allow only a certain executable to be able to create user namespaces by setting a certain capability, I'd suggest the name CAP_SYS_USER_NS. Executables completely unrelated to user namespaces should and can not create them. The capability should only be required in the "root" user namespace (the user namespace with level 0) though, to allow nested user namespaces to work as intended. If a user namespace has a level greater than 0, the original process must have had CAP_SYS_USER_NS, so it is "trusted" anyway. One question remains though: Does this break userspace executables that expect being able to create user namespaces without priviledge? Since creating user namespaces without CAP_SYS_ADMIN was not possible before Linux 3.8, programs should already expect a potential EPERM upon calling clone. Since creating a user namespace without CAP_SYS_USER_NS would also cause EPERM, we should be on the safe side. Cc: linux-security-mod...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: Eric W. Biederman Cc: Al Viro Cc: Serge Hallyn Cc: Andy Lutomirski Cc: Andrew Morton Cc: Christoph Lameter Cc: Michael Kerrisk Signed-off-by: Tobias Markus --- include/uapi/linux/capability.h | 5 - kernel/cred.c | 7 ++- security/selinux/include/classmap.h | 2 +- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index 12c37a1..d83540f 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -351,8 +351,11 @@ struct vfs_cap_data { #define CAP_AUDIT_READ37 +/* Allow creating user namespaces (CLONE_NEWUSER) using clone() and unshare() */ -#define CAP_LAST_CAP CAP_AUDIT_READ +#define CAP_SYS_USER_NS 38 + +#define CAP_LAST_CAP CAP_SYS_USER_NS #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) diff --git a/kernel/cred.c b/kernel/cred.c index 71179a0..847d499 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -345,7 +345,12 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) return -ENOMEM; if (clone_flags & CLONE_NEWUSER) { - ret = create_user_ns(new); + if (new->user_ns->level == 0 && + !has_capability(p, CAP_SYS_USER_NS)) { + ret = -EPERM; + } else { + ret = create_user_ns(new); + } if (ret < 0) goto error_put; } diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index 5a4eef5..07cec76 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -39,7 +39,7 @@ struct security_class_mapping secclass_map[] = { "linux_immutable", "net_bind_service", "net_broadcast", "net_admin", "net_raw", "ipc_lock", "ipc_owner", "sys_module", "sys_rawio", "sys_chroot", "sys_ptrace", "sys_pacct", "sys_admin", - "sys_boot", "sys_nice", "sys_resource", "sys_time", + "sys_boot", "sys_nice", "sys_resource", "sys_time", "sys_user_ns", "sys_tty_config", "mknod", "lease", "audit_write", "audit_control", "setfcap", NULL } }, { "filesystem", -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] userns/capability: Add user namespace capability
On 17.10.2015 23:55, Serge E. Hallyn wrote: > On Sat, Oct 17, 2015 at 05:58:04PM +0200, Tobias Markus wrote: >> Add capability CAP_SYS_USER_NS. >> Tasks having CAP_SYS_USER_NS are allowed to create a new user namespace >> when calling clone or unshare with CLONE_NEWUSER. >> >> Rationale: >> >> Linux 3.8 saw the introduction of unpriviledged user namespaces, >> allowing unpriviledged users (without CAP_SYS_ADMIN) to be a "fake" root >> inside a separate user namespace. Before that, any namespace creation >> required CAP_SYS_ADMIN (or, in practice, the user had to be root). >> Unfortunately, there have been some security-relevant bugs in the >> meantime. Because of the fairly complex nature of user namespaces, it is >> reasonable to say that future vulnerabilties can not be excluded. Some >> distributions even wholly disable user namespaces because of this. > > Fwiw I'm not in favor of this. Debian has a patch (I believe the one > I originally wrote for Ubuntu but which Ubuntu dropped long ago) adding a > sysctl, off by default, for enabling user namespaces. While it certainly works, enabling a feature like this at runtime doesn't seem like a long term solution. The fact that Debian added this patch in the first place already demonstrates that there is demand for a way to limit unpriviledged user namespace creation. Please, don't get me wrong: I would *really like* to see widespread adoption and continued development of user namespaces! But the status quo remains: Distributions outright disabling user namespaces (e.g. Arch Linux) won't make it easier. > > Posix capabilities are intended for privileged actions, not for > actions which explicitly should not require privilege, but which > we feel are in development. > Certainly, in an ideal world, user namespaces will never lead to any kernel-level exploits. But reality is different: There *have been* serious kernel vulnerabilities due to user namespaces, and there *will be* serious kernel vulnerabilities due to user namespaces. Now, those are the alternatives imho: * Status quo: Some distributions will disable user namespaces by default in some way or another. User wishing to use user namespaces will have to use a custom kernel or enable a sysctl flag that was patched in by the downstream developers. On distributions that enable user namespaces by default, even users that don't wish to use them in the first places will be affected by vulnerabilities. * Adding a capabilitiy: First of all, there would be no need for any downstream patches or custom kernels. Users that wish to use user namespaces would only have to enable the capability on the affected executables, if that hasn't been done by the package maintainers already. Users that might not even know of user namespaces have their peace. > In general, the feeling is that putting a feature like this behind a > wall will only slow down the finding of any bugs, so I think the goal > itself is questionable. But the chosen means for achieving your goal > are definately wrong. I'm not talking about removing user namespaces altogether or making them impossible to use - as I said above, user wouldn't notice anything in the best case. Replacing setuid binaries with capabilitiy-based ones has been done for quite some time now and I don't think anyone complained. I honestly don't see why adding a new capability would slow down finding bugs. Not every program magically profits from user namespaces. Why would, say, GCC, date or vim improve by using user namespaces? My point is that use cases for user namespaces won't magically rain down from Heaven just because it possible to use them without priviledge. And it is hardly difficult to add the capabilitiy to those applications that use user namespaces, is it? setcap cap_sys_user_ns+ep $binary doesn't sound very complicated to me. I would actually say not adding this capability would slow down finding bugs since users are less inclined to enable the feature if they can't limit its security impact. Furthermore, saying "let's enable this complex security-relevant feature by default and make it impossible to limit it to certain files so users will find more bugs" is fundamentally wrong approach to security imho. First, you aren't likely to get more bug reports because distributions aren't that risky. Second, even if you get more bug reports, _the damage is already done_. Sysadmins won't be that happy and will very likely disable the very feature that caused the damage in the first place. > >> Both options, user namespaces with and without CAP_SYS_ADMIN, can be >> said to represent the extreme end of the spectrum. In practice, there is >> no reason for every process to have the abilitiy to create user >> n
Re: [PATCH] userns/capability: Add user namespace capability
On 17.10.2015 22:17, Richard Weinberger wrote: > On Sat, Oct 17, 2015 at 5:58 PM, Tobias Markus wrote: >> One question remains though: Does this break userspace executables that >> expect being able to create user namespaces without priviledge? Since >> creating user namespaces without CAP_SYS_ADMIN was not possible before >> Linux 3.8, programs should already expect a potential EPERM upon calling >> clone. Since creating a user namespace without CAP_SYS_USER_NS would >> also cause EPERM, we should be on the safe side. > > In case of doubt, yes it will break existing software. > Hiding user namespaces behind CAP_SYS_USER_NS will not magically > make them secure. > The goal is not to make user namespaces secure, but to limit access to them somewhat in order to reduce the potential attack surface. Please see my reply to Serge for further details. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] userns/capability: Add user namespace capability
On 18.10.2015 22:21, Richard Weinberger wrote: > Am 18.10.2015 um 22:13 schrieb Tobias Markus: >> On 17.10.2015 22:17, Richard Weinberger wrote: >>> On Sat, Oct 17, 2015 at 5:58 PM, Tobias Markus wrote: >>>> One question remains though: Does this break userspace executables that >>>> expect being able to create user namespaces without priviledge? Since >>>> creating user namespaces without CAP_SYS_ADMIN was not possible before >>>> Linux 3.8, programs should already expect a potential EPERM upon calling >>>> clone. Since creating a user namespace without CAP_SYS_USER_NS would >>>> also cause EPERM, we should be on the safe side. >>> >>> In case of doubt, yes it will break existing software. >>> Hiding user namespaces behind CAP_SYS_USER_NS will not magically >>> make them secure. >>> >> The goal is not to make user namespaces secure, but to limit access to >> them somewhat in order to reduce the potential attack surface. > > We have already a framework to reduce the attack surface, seccomp. > There is no need to invent new capabilities for every non-trivial > kernel feature. > > I can understand the user namespaces seems scary and had bugs. > But which software didn't? > > Are there any unfixed exploitable bugs in user namespaces in recent kerenls? > > Thanks, > //richard Isn't seccomp about setting a per-thread syscall filter? Correct me if I'm wrong, but I don't know of any way of using seccomp to globally ban the use of clone or unshare with CLONE_NEWUSER except for a few whiteliste executables, and that's the idea of this hypothetical capability. Sure, there are no known exploitable bugs in recent kernels, but would you guarantee that for the next 10 years? Every software has bugs, some of them exploitable, no amount of testing can prevent that. I'm not paranoid, but on the other hand, why should every Linux user having CONFIG_USER_NS enabled have to expose more attack surface than he absolutely has to? Richard, would you run an Apache HTTP server exposed to the internet on your own laptop, without any security precautions? According to your reasoning, Apache is surely scary and has many bugs, but every software has bugs, right? I really don't want to introduce a user-facing API change just for the fun of it - so if there's any better way to do this, please tell me. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] userns/capability: Add user namespace capability
On 18.10.2015 22:48, Richard Weinberger wrote: > Am 18.10.2015 um 22:41 schrieb Tobias Markus: >> On 18.10.2015 22:21, Richard Weinberger wrote: >>> Am 18.10.2015 um 22:13 schrieb Tobias Markus: >>>> On 17.10.2015 22:17, Richard Weinberger wrote: >>>>> On Sat, Oct 17, 2015 at 5:58 PM, Tobias Markus wrote: >>>>>> One question remains though: Does this break userspace executables that >>>>>> expect being able to create user namespaces without priviledge? Since >>>>>> creating user namespaces without CAP_SYS_ADMIN was not possible before >>>>>> Linux 3.8, programs should already expect a potential EPERM upon calling >>>>>> clone. Since creating a user namespace without CAP_SYS_USER_NS would >>>>>> also cause EPERM, we should be on the safe side. >>>>> >>>>> In case of doubt, yes it will break existing software. >>>>> Hiding user namespaces behind CAP_SYS_USER_NS will not magically >>>>> make them secure. >>>>> >>>> The goal is not to make user namespaces secure, but to limit access to >>>> them somewhat in order to reduce the potential attack surface. >>> >>> We have already a framework to reduce the attack surface, seccomp. >>> There is no need to invent new capabilities for every non-trivial >>> kernel feature. >>> >>> I can understand the user namespaces seems scary and had bugs. >>> But which software didn't? >>> >>> Are there any unfixed exploitable bugs in user namespaces in recent kerenls? >>> >>> Thanks, >>> //richard >> >> Isn't seccomp about setting a per-thread syscall filter? Correct me if >> I'm wrong, but I don't know of any way of using seccomp to globally ban >> the use of clone or unshare with CLONE_NEWUSER except for a few >> whiteliste executables, and that's the idea of this hypothetical capability. > > This is correct. > If you want it globally you can still use LSM. The LSM isn't really the one-size-fits-all solution that distributions like to ship in their standard kernels... > >> Sure, there are no known exploitable bugs in recent kernels, but would >> you guarantee that for the next 10 years? Every software has bugs, some >> of them exploitable, no amount of testing can prevent that. I'm not >> paranoid, but on the other hand, why should every Linux user having >> CONFIG_USER_NS enabled have to expose more attack surface than he >> absolutely has to? > > And what about all the other kernel features? > I really don't get why you choose user namespaces as your enemy. I didn't choose user namespaces as my enemy, I chose user namespaces as the feature that I would really like to have shipped by default by my and by other distributions, but that's sadly often disabled for security concerns. Is there any solution that can be safely used by distributions to have user namespaces enabled by default without worrying about security? > >> Richard, would you run an Apache HTTP server exposed to the internet on >> your own laptop, without any security precautions? According to your >> reasoning, Apache is surely scary and has many bugs, but every software >> has bugs, right? > > This argument is bogus and you know that too. Sure, it's exaggregated, but still, if it's possible to reduce the attack surface for every user without great effort, why shouldn't that be done? To give an example more closely resembling the matter in hand: CAP_SYSLOG allows viewing kernel addresses when kptr_restrict is enabled. But why limit access to the kernel symbols? There is nothing an attacker can do with them, except there is a kernel bug. But before we continue arguing endlessly, I just got an idea: What about adding a sysctl to enable/disable enforcement of the hypothetical CAP_SYS_USER_NS, just like with /proc/sys/kernel/kptr_restrict and CAP_SYSLOG? Would also prevent any potential userspace breakage. > >> I really don't want to introduce a user-facing API change just for the >> fun of it - so if there's any better way to do this, please tell me. > > As I said, it really don't see why we should treat user namespaces in a > special > way. It is a kernel feature like many others are. If you don't trust it, > disable it. And there's the problem: It's either 100% (unpriviledged user namespaces without limit) or 0% (disable user namespaces entirely). > > Thanks, > //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/