James Peach created MESOS-8313:
----------------------------------

             Summary: Provide a host namespace container supervisor.
                 Key: MESOS-8313
                 URL: https://issues.apache.org/jira/browse/MESOS-8313
             Project: Mesos
          Issue Type: Improvement
          Components: containerization
            Reporter: James Peach
            Assignee: James Peach


After more investigation on user namespaces, the current implementation of 
creating the container namespaces needs some adjustment before we can implement 
user namespaces in a useable fashion.

The problems we need to address are:

1. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the PID namespace to 
mount {{procfs}}. Currently, this prevents containers joining the host PID 
namespace. The workaround is to always create a new container PID namespace (as 
a child of the user namespace) with the {{namespaces/pid}} isolator.

2. The containerized needs to hold {{CAP_SYS_ADMIN}} over the network namespace 
to mount {{sysfs}}. There's no general workaround for this since we can't 
generally require containers to not join the host network namespace.

3. The containerizer can't enter a user namespace after entering the 
{{chroot}}. This restriction makes the existing order of containerizer 
operations impossible to remain in the case where we want the executor to be in 
a new user namespace that has no children (i.e. to protect the container from a 
privileged task).

After some discussion with [~jieyu], we believe that we can some most or all of 
these issues by creating a new containerized supervisor that runs fully outside 
the container and is responsible for constructing the roots mount namespace, 
launching the containerized to enter the rest of the container, and waiting on 
the entered process.

Since this new supervisor process is not running in the user namespace, it will 
be able to construct the container rootfs in a new mount namespace without user 
namespace restrictions. We can then clone a child to fully create and enter 
container namespaces along with the prefabricated rootfs mount namespace.

The only drawback to this approach is that the container's mount namespace will 
be owned by the root user namespace rather than the container user namespace. 
We are OK with this for now.

The plan here is to retain the existing {{mesos-containerizer launch}} 
subcommand and add a new {{mesos-containerizer supervise}} subcommand, which 
will be its parent process. This new subcommand will be used for the default 
executor and custom executor code paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to