Aditya Kali <adityak...@google.com> writes: > On Tue, Sep 30, 2014 at 5:35 PM, Eric W. Biederman > <ebied...@xmission.com> wrote: >> Aditya Kali <adityak...@google.com> writes: >> >>> Hi all, >>> >>> I am trying to run a process with uid=0 inside userns. But in the when >>> I also do capset() after setresuid(0, 0, 0), I am seeing inconsistent >>> proc file permissions. Almost all the files in /proc/<pid>/ has global >>> 'root' as owner and group even if the actual process uid is correctly >>> changed. >>> >>> I wrote a simple program that demonstrate the issue: >>> >>> 1. parent, as global root (uid=0 in init_user_ns) fork()s a child >>> 2. child: >>> a) unshare(CLONE_NEWUSER) >>> b) [wait for parent to write uid_map] >>> c) setresgid(id, id, id) ; setresuid(0, 0, 0); >>> d) conditionally call capset() to clear capabilities >>> e) execve(/bin/sleep) >>> 3. parent: >>> a) populates child's uid_map and maps some uid to 0 inside userns. ex: >>> 0 99 1 >>> b) waitpid() >>> >>> (the actual program can be found at http://pastebin.com/f4P17VFn for >>> your reference). >>> >>> When there is no capset() call after setresuid(0,0,0), everything is >>> fine. But when I do a capset() to clear all capabilities, the 'owner' >>> and 'group' of all the files under /proc/<child_pid>/ of the child >>> process are reverted to global 'root' user. >>> >>> # without capset (2.d): >>> root@vm1# id >>> uid=0(root) gid=0(root) groups=0(root) >>> >>> root@vm1# ./userns_uid0 >>> child_pid: 24277 >>> proc_file: /proc/24277/uid_map >>> proc_file: /proc/24277/gid_map >>> child resuming >>> >>> ^Z >>> [1]+ Stopped ./userns_uid0 >>> root@vm1# cat /proc/24277/uid_map >>> 0 99 1 >>> root@vm1# cat /proc/24277/status | grep -e "Uid:" -e "Gid:" >>> Uid: 99 99 99 99 >>> Gid: 99 99 99 99 >>> root@vm1# ls -l /proc/24277/ >>> total 0 >>> dr-xr-xr-x 2 nobody nobody 0 2014-09-30 16:31 attr >>> -r-------- 1 nobody nobody 0 2014-09-30 16:31 auxv >>> -r--r--r-- 1 nobody nobody 0 2014-09-30 16:31 cgroup >>> --w------- 1 nobody nobody 0 2014-09-30 16:31 clear_refs >>> -r--r--r-- 1 nobody nobody 0 2014-09-30 16:31 cmdline >>> -rw-r--r-- 1 nobody nobody 0 2014-09-30 16:31 comm >>> -rw-r--r-- 1 nobody nobody 0 2014-09-30 16:31 coredump_filter >>> -r--r--r-- 1 nobody nobody 0 2014-09-30 16:31 cpuset >>> ... >>> [All files have owner='nobody' and group='nobody' .. same as that of >>> the process] >>> >>> With the additional capset() call, the files under /proc/<child_pid>/ >>> are now owned by global root: >>> >>> root@vm1# ./userns_uid0 resetcaps >>> child_pid: 24706 >>> proc_file: /proc/24706/uid_map >>> proc_file: /proc/24706/gid_map >>> child resuming >>> resetting caps >>> ^Z >>> [2]+ Stopped ./userns_uid0 resetcaps >>> root@vm1# cat /proc/24706/uid_map >>> 0 99 1 >>> root@vm1# cat /proc/24706/status | grep -e "Uid:" -e "Gid:" >>> Uid: 99 99 99 99 >>> Gid: 99 99 99 99 >>> >>> [Everything as before till now] >>> >>> root@vm1# ls -l /proc/24706/ >>> total 0 >>> dr-xr-xr-x 2 nobody nobody 0 2014-09-30 16:47 attr >>> -r-------- 1 root root 0 2014-09-30 16:47 auxv >>> -r--r--r-- 1 root root 0 2014-09-30 16:47 cgroup >>> --w------- 1 root root 0 2014-09-30 16:47 clear_refs >>> -r--r--r-- 1 root root 0 2014-09-30 16:47 cmdline >>> -rw-r--r-- 1 root root 0 2014-09-30 16:47 comm >>> -rw-r--r-- 1 root root 0 2014-09-30 16:47 coredump_filter >>> -r--r--r-- 1 root root 0 2014-09-30 16:47 cpuset >>> ... >>> -r--r--r-- 1 root root 0 2014-09-30 16:47 mountinfo >>> -r--r--r-- 1 root root 0 2014-09-30 16:47 mounts >>> -r-------- 1 root root 0 2014-09-30 16:47 mountstats >>> dr-xr-xr-x 5 nobody nobody 0 2014-09-30 16:47 net >>> dr-x--x--x 2 root root 0 2014-09-30 16:47 ns >>> -r--r--r-- 1 root root 0 2014-09-30 16:47 numa_maps >>> ... >>> -r--r--r-- 1 root root 0 2014-09-30 16:47 status >>> -r-------- 1 root root 0 2014-09-30 16:47 syscall >>> dr-xr-xr-x 3 nobody nobody 0 2014-09-30 16:47 task >>> .. >>> >>> Only the directories 'attr', 'net' and 'task' are owned by the uid=99. >>> Rest all files are owned by global root. >>> >>> This behavior seems inconsistent. I ran this on 3.17 kernel. Can >>> someone with expertise in this area explain if this is expected? >> >> So I am not quite certain what you are seeing. >> >> In general proc files are expected to be owned by the euid of a process. >> However when the task_dumpable is cleared the files become owned by the >> global root user. We have considered relaxing that to the namespace >> root user but so far implementing a more granular task_dumpable has not >> been done. >> > > I tried explicitly setting PR_SET_DUMPABLE before execve(), but that > didn't either. > >> The directories are world readable so they don't matter. >> >> What puzzles me is that you have directories owned by nobody, and you >> are talking about uid = 99 and gid = 99. Nobody is traditionally >> (u16_t)-2 and there should never actually be used by anyone. And is >> used as the default number of unmapped uids and gids. >> >> It looks like you are doing something weird with nobody so I don't have >> a clue what is actually going on. >> > > The issue is not specific to uid 99 or "nobody". Its just a dummy user > I have for testing. The issue happens with any user with non-zero uid.
But my issue with reading your directory listings of proc is. I can't tell if you are giving me a listing of proc from a process in the user namespace or outside of the user namespace. If the process 24706 had uid == 99 and gid == 99 (outside of the user namespace). And your are listing the files from outside of the user namespace. And uid 99 is mapped to nobody in /etc/passwd and gid 99 is mapped to nobody in /etc/group. And your ls process is not running in your user namespace. Then this looks like proper handling of dumpable. Otherwise I don't have a clue what is going on because I can't make sense of your directory listings. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/