[PATCH] userns/capability: Add user namespace capability

2015-10-17 Thread Tobias Markus
Add capability CAP_SYS_USER_NS.
Tasks having CAP_SYS_USER_NS are allowed to create a new user namespace
when calling clone or unshare with CLONE_NEWUSER.

Rationale:

Linux 3.8 saw the introduction of unpriviledged user namespaces,
allowing unpriviledged users (without CAP_SYS_ADMIN) to be a "fake" root
inside a separate user namespace. Before that, any namespace creation
required CAP_SYS_ADMIN (or, in practice, the user had to be root).
Unfortunately, there have been some security-relevant bugs in the
meantime. Because of the fairly complex nature of user namespaces, it is
reasonable to say that future vulnerabilties can not be excluded. Some
distributions even wholly disable user namespaces because of this.

Both options, user namespaces with and without CAP_SYS_ADMIN, can be
said to represent the extreme end of the spectrum. In practice, there is
no reason for every process to have the abilitiy to create user
namespaces. Indeed, only very few and specialized programs require user
namespaces. This seems to be a perfect fit for the (file) capability
system: Priviledged users could manually allow only a certain executable
to be able to create user namespaces by setting a certain capability,
I'd suggest the name CAP_SYS_USER_NS. Executables completely unrelated
to user namespaces should and can not create them.

The capability should only be required in the "root" user namespace (the
user namespace with level 0) though, to allow nested user namespaces to
work as intended. If a user namespace has a level greater than 0, the
original process must have had CAP_SYS_USER_NS, so it is "trusted" anyway.

One question remains though: Does this break userspace executables that
expect being able to create user namespaces without priviledge? Since
creating user namespaces without CAP_SYS_ADMIN was not possible before
Linux 3.8, programs should already expect a potential EPERM upon calling
clone. Since creating a user namespace without CAP_SYS_USER_NS would
also cause EPERM, we should be on the safe side.

Cc: linux-security-mod...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: Eric W. Biederman 
Cc: Al Viro 
Cc: Serge Hallyn 
Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Christoph Lameter 
Cc: Michael Kerrisk 
Signed-off-by: Tobias Markus 
---
 include/uapi/linux/capability.h | 5 -
 kernel/cred.c   | 7 ++-
 security/selinux/include/classmap.h | 2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 12c37a1..d83540f 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -351,8 +351,11 @@ struct vfs_cap_data {
  #define CAP_AUDIT_READ37
 +/* Allow creating user namespaces (CLONE_NEWUSER) using clone() and unshare() 
*/
 -#define CAP_LAST_CAP CAP_AUDIT_READ
+#define CAP_SYS_USER_NS  38
+
+#define CAP_LAST_CAP CAP_SYS_USER_NS
  #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
 diff --git a/kernel/cred.c b/kernel/cred.c
index 71179a0..847d499 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -345,7 +345,12 @@ int copy_creds(struct task_struct *p, unsigned long 
clone_flags)
return -ENOMEM;
if (clone_flags & CLONE_NEWUSER) {
-   ret = create_user_ns(new);
+   if (new->user_ns->level == 0 &&
+   !has_capability(p, CAP_SYS_USER_NS)) {
+   ret = -EPERM;
+   } else {
+   ret = create_user_ns(new);
+   }
if (ret < 0)
goto error_put;
}
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 5a4eef5..07cec76 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -39,7 +39,7 @@ struct security_class_mapping secclass_map[] = {
"linux_immutable", "net_bind_service", "net_broadcast",
"net_admin", "net_raw", "ipc_lock", "ipc_owner", "sys_module",
"sys_rawio", "sys_chroot", "sys_ptrace", "sys_pacct", "sys_admin",
-   "sys_boot", "sys_nice", "sys_resource", "sys_time",
+   "sys_boot", "sys_nice", "sys_resource", "sys_time", "sys_user_ns",
"sys_tty_config", "mknod", "lease", "audit_write",
"audit_control", "setfcap", NULL } },
{ "filesystem",
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] userns/capability: Add user namespace capability

2015-10-18 Thread Tobias Markus
On 17.10.2015 23:55, Serge E. Hallyn wrote:
> On Sat, Oct 17, 2015 at 05:58:04PM +0200, Tobias Markus wrote:
>> Add capability CAP_SYS_USER_NS.
>> Tasks having CAP_SYS_USER_NS are allowed to create a new user namespace
>> when calling clone or unshare with CLONE_NEWUSER.
>>
>> Rationale:
>>
>> Linux 3.8 saw the introduction of unpriviledged user namespaces,
>> allowing unpriviledged users (without CAP_SYS_ADMIN) to be a "fake" root
>> inside a separate user namespace. Before that, any namespace creation
>> required CAP_SYS_ADMIN (or, in practice, the user had to be root).
>> Unfortunately, there have been some security-relevant bugs in the
>> meantime. Because of the fairly complex nature of user namespaces, it is
>> reasonable to say that future vulnerabilties can not be excluded. Some
>> distributions even wholly disable user namespaces because of this.
> 
> Fwiw I'm not in favor of this.  Debian has a patch (I believe the one
> I originally wrote for Ubuntu but which Ubuntu dropped long ago) adding a
> sysctl, off by default, for enabling user namespaces.

While it certainly works, enabling a feature like this at runtime
doesn't seem like a long term solution.

The fact that Debian added this patch in the first place already
demonstrates that there is demand for a way to limit unpriviledged user
namespace creation. Please, don't get me wrong: I would *really like* to
see widespread adoption and continued development of user namespaces!
But the status quo remains: Distributions outright disabling user
namespaces (e.g. Arch Linux) won't make it easier.

> 
> Posix capabilities are intended for privileged actions, not for
> actions which explicitly should not require privilege, but which
> we feel are in development.
> 

Certainly, in an ideal world, user namespaces will never lead to any
kernel-level exploits. But reality is different: There *have been*
serious kernel vulnerabilities due to user namespaces, and there *will
be* serious kernel vulnerabilities due to user namespaces.

Now, those are the alternatives imho:

* Status quo: Some distributions will disable user namespaces by default
in some way or another. User wishing to use user namespaces will have to
use a custom kernel or enable a sysctl flag that was patched in by the
downstream developers. On distributions that enable user namespaces by
default, even users that don't wish to use them in the first places will
be affected by vulnerabilities.

* Adding a capabilitiy: First of all, there would be no need for any
downstream patches or custom kernels. Users that wish to use user
namespaces would only have to enable the capability on the affected
executables, if that hasn't been done by the package maintainers
already. Users that might not even know of user namespaces have their peace.

> In general, the feeling is that putting a feature like this behind a
> wall will only slow down the finding of any bugs, so I think the goal
> itself is questionable.  But the chosen means for achieving your goal
> are definately wrong.

I'm not talking about removing user namespaces altogether or making them
impossible to use - as I said above, user wouldn't notice anything in
the best case. Replacing setuid binaries with capabilitiy-based ones has
been done for quite some time now and I don't think anyone complained.

I honestly don't see why adding a new capability would slow down finding
bugs. Not every program magically profits from user namespaces. Why
would, say, GCC, date or vim improve by using user namespaces? My point
is that use cases for user namespaces won't magically rain down from
Heaven just because it possible to use them without priviledge. And it
is hardly difficult to add the capabilitiy to those applications that
use user namespaces, is it? setcap cap_sys_user_ns+ep $binary doesn't
sound very complicated to me.
I would actually say not adding this capability would slow down finding
bugs since users are less inclined to enable the feature if they can't
limit its security impact.

Furthermore, saying "let's enable this complex security-relevant feature
by default and make it impossible to limit it to certain files so users
will find more bugs" is fundamentally wrong approach to security imho.
First, you aren't likely to get more bug reports because distributions
aren't that risky. Second, even if you get more bug reports, _the damage
is already done_. Sysadmins won't be that happy and will very likely
disable the very feature that caused the damage in the first place.

> 
>> Both options, user namespaces with and without CAP_SYS_ADMIN, can be
>> said to represent the extreme end of the spectrum. In practice, there is
>> no reason for every process to have the abilitiy to create user
>> n

Re: [PATCH] userns/capability: Add user namespace capability

2015-10-18 Thread Tobias Markus
On 17.10.2015 22:17, Richard Weinberger wrote:
> On Sat, Oct 17, 2015 at 5:58 PM, Tobias Markus  wrote:
>> One question remains though: Does this break userspace executables that
>> expect being able to create user namespaces without priviledge? Since
>> creating user namespaces without CAP_SYS_ADMIN was not possible before
>> Linux 3.8, programs should already expect a potential EPERM upon calling
>> clone. Since creating a user namespace without CAP_SYS_USER_NS would
>> also cause EPERM, we should be on the safe side.
> 
> In case of doubt, yes it will break existing software.
> Hiding user namespaces behind CAP_SYS_USER_NS will not magically
> make them secure.
> 
The goal is not to make user namespaces secure, but to limit access to
them somewhat in order to reduce the potential attack surface.
Please see my reply to Serge for further details.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] userns/capability: Add user namespace capability

2015-10-18 Thread Tobias Markus
On 18.10.2015 22:21, Richard Weinberger wrote:
> Am 18.10.2015 um 22:13 schrieb Tobias Markus:
>> On 17.10.2015 22:17, Richard Weinberger wrote:
>>> On Sat, Oct 17, 2015 at 5:58 PM, Tobias Markus  wrote:
>>>> One question remains though: Does this break userspace executables that
>>>> expect being able to create user namespaces without priviledge? Since
>>>> creating user namespaces without CAP_SYS_ADMIN was not possible before
>>>> Linux 3.8, programs should already expect a potential EPERM upon calling
>>>> clone. Since creating a user namespace without CAP_SYS_USER_NS would
>>>> also cause EPERM, we should be on the safe side.
>>>
>>> In case of doubt, yes it will break existing software.
>>> Hiding user namespaces behind CAP_SYS_USER_NS will not magically
>>> make them secure.
>>>
>> The goal is not to make user namespaces secure, but to limit access to
>> them somewhat in order to reduce the potential attack surface.
> 
> We have already a framework to reduce the attack surface, seccomp.
> There is no need to invent new capabilities for every non-trivial
> kernel feature.
> 
> I can understand the user namespaces seems scary and had bugs.
> But which software didn't?
> 
> Are there any unfixed exploitable bugs in user namespaces in recent kerenls?
> 
> Thanks,
> //richard

Isn't seccomp about setting a per-thread syscall filter? Correct me if
I'm wrong, but I don't know of any way of using seccomp to globally ban
the use of clone or unshare with CLONE_NEWUSER except for a few
whiteliste executables, and that's the idea of this hypothetical capability.

Sure, there are no known exploitable bugs in recent kernels, but would
you guarantee that for the next 10 years? Every software has bugs, some
of them exploitable, no amount of testing can prevent that. I'm not
paranoid, but on the other hand, why should every Linux user having
CONFIG_USER_NS enabled have to expose more attack surface than he
absolutely has to?

Richard, would you run an Apache HTTP server exposed to the internet on
your own laptop, without any security precautions? According to your
reasoning, Apache is surely scary and has many bugs, but every software
has bugs, right?

I really don't want to introduce a user-facing API change just for the
fun of it - so if there's any better way to do this, please tell me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] userns/capability: Add user namespace capability

2015-10-18 Thread Tobias Markus
On 18.10.2015 22:48, Richard Weinberger wrote:
> Am 18.10.2015 um 22:41 schrieb Tobias Markus:
>> On 18.10.2015 22:21, Richard Weinberger wrote:
>>> Am 18.10.2015 um 22:13 schrieb Tobias Markus:
>>>> On 17.10.2015 22:17, Richard Weinberger wrote:
>>>>> On Sat, Oct 17, 2015 at 5:58 PM, Tobias Markus  wrote:
>>>>>> One question remains though: Does this break userspace executables that
>>>>>> expect being able to create user namespaces without priviledge? Since
>>>>>> creating user namespaces without CAP_SYS_ADMIN was not possible before
>>>>>> Linux 3.8, programs should already expect a potential EPERM upon calling
>>>>>> clone. Since creating a user namespace without CAP_SYS_USER_NS would
>>>>>> also cause EPERM, we should be on the safe side.
>>>>>
>>>>> In case of doubt, yes it will break existing software.
>>>>> Hiding user namespaces behind CAP_SYS_USER_NS will not magically
>>>>> make them secure.
>>>>>
>>>> The goal is not to make user namespaces secure, but to limit access to
>>>> them somewhat in order to reduce the potential attack surface.
>>>
>>> We have already a framework to reduce the attack surface, seccomp.
>>> There is no need to invent new capabilities for every non-trivial
>>> kernel feature.
>>>
>>> I can understand the user namespaces seems scary and had bugs.
>>> But which software didn't?
>>>
>>> Are there any unfixed exploitable bugs in user namespaces in recent kerenls?
>>>
>>> Thanks,
>>> //richard
>>
>> Isn't seccomp about setting a per-thread syscall filter? Correct me if
>> I'm wrong, but I don't know of any way of using seccomp to globally ban
>> the use of clone or unshare with CLONE_NEWUSER except for a few
>> whiteliste executables, and that's the idea of this hypothetical capability.
> 
> This is correct.
> If you want it globally you can still use LSM.

The LSM isn't really the one-size-fits-all solution that distributions
like to ship in their standard kernels...

> 
>> Sure, there are no known exploitable bugs in recent kernels, but would
>> you guarantee that for the next 10 years? Every software has bugs, some
>> of them exploitable, no amount of testing can prevent that. I'm not
>> paranoid, but on the other hand, why should every Linux user having
>> CONFIG_USER_NS enabled have to expose more attack surface than he
>> absolutely has to?
> 
> And what about all the other kernel features?
> I really don't get why you choose user namespaces as your enemy.

I didn't choose user namespaces as my enemy, I chose user namespaces as
the feature that I would really like to have shipped by default by my
and by other distributions, but that's sadly often disabled for security
concerns. Is there any solution that can be safely used by distributions
to have user namespaces enabled by default without worrying about security?

> 
>> Richard, would you run an Apache HTTP server exposed to the internet on
>> your own laptop, without any security precautions? According to your
>> reasoning, Apache is surely scary and has many bugs, but every software
>> has bugs, right?
> 
> This argument is bogus and you know that too.

Sure, it's exaggregated, but still, if it's possible to reduce the
attack surface for every user without great effort, why shouldn't that
be done?

To give an example more closely resembling the matter in hand:
CAP_SYSLOG allows viewing kernel addresses when kptr_restrict is
enabled. But why limit access to the kernel symbols? There is nothing an
attacker can do with them, except there is a kernel bug.

But before we continue arguing endlessly, I just got an idea: What about
adding a sysctl to enable/disable enforcement of the hypothetical
CAP_SYS_USER_NS, just like with /proc/sys/kernel/kptr_restrict and
CAP_SYSLOG? Would also prevent any potential userspace breakage.

> 
>> I really don't want to introduce a user-facing API change just for the
>> fun of it - so if there's any better way to do this, please tell me.
> 
> As I said, it really don't see why we should treat user namespaces in a 
> special
> way. It is a kernel feature like many others are. If you don't trust it, 
> disable it.

And there's the problem: It's either 100% (unpriviledged user namespaces
without limit) or 0% (disable user namespaces entirely).
> 
> Thanks,
> //richard

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/