James Peach created MESOS-8313: ---------------------------------- Summary: Provide a host namespace container supervisor. Key: MESOS-8313 URL: https://issues.apache.org/jira/browse/MESOS-8313 Project: Mesos Issue Type: Improvement Components: containerization Reporter: James Peach Assignee: James Peach
After more investigation on user namespaces, the current implementation of creating the container namespaces needs some adjustment before we can implement user namespaces in a useable fashion. The problems we need to address are: 1. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the PID namespace to mount {{procfs}}. Currently, this prevents containers joining the host PID namespace. The workaround is to always create a new container PID namespace (as a child of the user namespace) with the {{namespaces/pid}} isolator. 2. The containerized needs to hold {{CAP_SYS_ADMIN}} over the network namespace to mount {{sysfs}}. There's no general workaround for this since we can't generally require containers to not join the host network namespace. 3. The containerizer can't enter a user namespace after entering the {{chroot}}. This restriction makes the existing order of containerizer operations impossible to remain in the case where we want the executor to be in a new user namespace that has no children (i.e. to protect the container from a privileged task). After some discussion with [~jieyu], we believe that we can some most or all of these issues by creating a new containerized supervisor that runs fully outside the container and is responsible for constructing the roots mount namespace, launching the containerized to enter the rest of the container, and waiting on the entered process. Since this new supervisor process is not running in the user namespace, it will be able to construct the container rootfs in a new mount namespace without user namespace restrictions. We can then clone a child to fully create and enter container namespaces along with the prefabricated rootfs mount namespace. The only drawback to this approach is that the container's mount namespace will be owned by the root user namespace rather than the container user namespace. We are OK with this for now. The plan here is to retain the existing {{mesos-containerizer launch}} subcommand and add a new {{mesos-containerizer supervise}} subcommand, which will be its parent process. This new subcommand will be used for the default executor and custom executor code paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)