[ https://issues.apache.org/jira/browse/MESOS-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323818#comment-15323818 ]
Jie Yu edited comment on MESOS-5544 at 6/10/16 4:45 AM: -------------------------------------------------------- Worked on a prototype here: https://github.com/jieyu/mesos/tree/agent_in_docker Tested with the following docker container and run command. The agent recovery works well! {noformat} sudo docker run \ --net=host \ --pid=host \ --privileged \ -e GLOG_v=1 \ -e LIBPROCESS_IP=10.0.2.15 \ -e MESOS_MASTER=10.0.2.15:5050 \ -e MESOS_HOSTNAME=localhost \ -e MESOS_WORK_DIR=/var/lib/mesos \ -e MESOS_DOCKER_STORE_DIR=/var/lib/mesos/store \ -e MESOS_ISOLATION=cgroups/cpu,cgroups/mem,docker/runtime,filesystem/linux \ -e MESOS_IMAGE_PROVIDERS=docker \ -e MESOS_SYSTEMD_ENABLE_SUPPORT=false \ -v /var/run/mesos:/var/run/mesos:shared \ -v /var/lib/mesos:/var/lib/mesos:shared \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /cgroup:/cgroup \ -v /sys:/sys \ -v /usr/local/bin/docker:/usr/local/bin/docker \ jieyu/mesos-from-source mesos-slave {noformat} was (Author: jieyu): Worked on a prototype here: https://github.com/jieyu/mesos/tree/agent_in_docker Will update with a docker image soon > Support running Mesos agent in a Docker container. > -------------------------------------------------- > > Key: MESOS-5544 > URL: https://issues.apache.org/jira/browse/MESOS-5544 > Project: Mesos > Issue Type: Improvement > Reporter: Jie Yu > > Currently, this does not work if one tries to use Mesos containerizer. > The main problem is that we want to make sure the executor is not killed when > agent crashes. So we have to use --pid=host so that the agent is in the host > pid namespace. > But that is not sufficient, Docker daemon will put agent into all cgroups > available on the host. We need to make sure we migrate the executor pid out > of those cgroups so that when agent crashes, executors are not killed. > Also, when start the agent container, volumes need to be setup properly so > that any mounts under agent's work_dir will be propagate back to the host > mount table. This is to make sure we can recover those mounts after agent > restarts. This is also true for those mounts that are needed by some isolator > (e.g., network/cni isolator). -- This message was sent by Atlassian JIRA (v6.3.4#6332)