Ricardo Wurmus writes:
> Hi Guix,
>
> I see that the container script generated by “guix system container”
> must be run as root. Looking at “initialize-user-namespace” in (gnu
> build linux-container) there is conditional code to be executed only
> when running as an unprivileged user, namely writing to
> /proc/pid/setgroups. This makes me think that this was originally meant
> to be usable without root privileges.
>
> Without root privileges write access to /proc/pid/* is denied. The
> child process here is the result of issuing a clone syscall.
>
> Why can’t the parent process write to the child’s /proc/pid/* files?
“man 7 user_namespaces” explains what conditions must be met for a
parent process to write to /proc/childpid/uid_map. There are many
conditions that could lead to EPERM. It seems that writing to
/proc/pid/setgroups succeeds and only writing to uid_map and gid_map
fails.
The parent process should be able to write to these files at least
once; as the parent it should have the capabilities CAP_SETUID and
CAP_SETGID in the child process namespace.
--
Ricardo