On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote: > cap_setfcap is required to create file capabilities. > > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a > process running as uid 0 but without cap_setfcap is able to work around > this as follows: unshare a new user namespace which maps parent uid 0 > into the child namespace. While this task will not have new > capabilities against the parent namespace, there is a loophole due to > the way namespaced file capabilities are represented as xattrs. File > capabilities valid in userns 1 are distinguished from file capabilities > valid in userns 2 by the kuid which underlies uid 0. Therefore the > restricted root process can unshare a new self-mapping namespace, add a > namespaced file capability onto a file, then use that file capability in > the parent namespace. > > To prevent that, do not allow mapping parent uid 0 if the process which > opened the uid_map file does not have CAP_SETFCAP, which is the capability > for setting file capabilities. > > As a further wrinkle: a task can unshare its user namespace, then > open its uid_map file itself, and map (only) its own uid. In this > case we do not have the credential from before unshare, which was > potentially more restricted. So, when creating a user namespace, we > record whether the creator had CAP_SETFCAP. Then we can use that > during map_write(). > > With this patch: > > 1. Unprivileged user can still unshare -Ur > > ubuntu@caps:~$ unshare -Ur > root@caps:~# logout > > 2. Root user can still unshare -Ur > > ubuntu@caps:~$ sudo bash > root@caps:/home/ubuntu# unshare -Ur > root@caps:/home/ubuntu# logout > > 3. Root user without CAP_SETFCAP cannot unshare -Ur: > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > unable to set CAP_SETFCAP effective capability: Operation not permitted > root@caps:/home/ubuntu# unshare -Ur > unshare: write failed /proc/self/uid_map: Operation not permitted > > Note: an alternative solution would be to allow uid 0 mappings by > processes without CAP_SETFCAP, but to prevent such a namespace from > writing any file capabilities. This approach can be seen here: > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 >
Ah, can you link to the previous fix and its revert, please? I think that was mentioned in the formerly private thread as well but we forgot: commit 95ebabde382c371572297915b104e55403674e73 Author: Eric W. Biederman <ebied...@xmission.com> Date: Thu Dec 17 09:42:00 2020 -0600 capabilities: Don't allow writing ambiguous v3 file capabilities commit 3b0c2d3eaa83da259d7726192cf55a137769012f Author: Eric W. Biederman <ebied...@xmission.com> Date: Fri Mar 12 15:07:09 2021 -0600 Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities") > Signed-off-by: Serge Hallyn <se...@hallyn.com> > Reviewed-by: Andrew G. Morgan <mor...@kernel.org> > Tested-by: Christian Brauner <christian.brau...@ubuntu.com> > Reviewed-by: Christian Brauner <christian.brau...@ubuntu.com> > Cc: "Eric W. Biederman" <ebied...@xmission.com>