[jira] [Commented] (MESOS-3219) Slave recovery issues with Docker containerizer
[ https://issues.apache.org/jira/browse/MESOS-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036024#comment-15036024 ] Yong Tang commented on MESOS-3219: -- If your systems could assume that the container can always run, then a (partially) workaround is to have a shell script to constantly restart mesos slave in a loop within the container. In this way, the shell script will be served as the foreground process so the container will not die. If mesos slave process itself dies then at least the shell script will restart and recover correctly. That obviously is not a complete solution but it may help in certain situations. > Slave recovery issues with Docker containerizer > --- > > Key: MESOS-3219 > URL: https://issues.apache.org/jira/browse/MESOS-3219 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Anderson >Assignee: Timothy Chen >Priority: Minor > > I'm working on setting up a Mesos environment with the > Docker containerizer and can't seem to get the recovery feature > working. I'm running CoreOS, so the slave processes themselves are > containerized. I have no issues running jobs without the recovery > features enabled, but all jobs fail to boot when I add the following > flags: > MESOS_DOCKER_KILL_ORPHANS=false > MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container > Inspecting the Docker images and their log output reveals that the > container invocation appears to be flawed - see this gist, which shows the > arguments as retrieved via `docker inspect` as well as the failed container's > log output: > https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b > The containerizer is attempting to invoke an unquoted command via > `/bin/sh -c`, which, predictably, fails to pass the complete command. > This results in the error message shown in the second file in the > linked gist. > This is reproducible manually; quoting the arguments to `/bin/sh -c` > results in success (at least, it correctly receives the supplied > arguments). > The slave container itself is not logging anything of interest. > It's possible that my instance is configured incorrectly as well; the > documentation here is a bit vague and there aren't many examples on the web. > I'm running Mesos 0.23.0 installed via http://repos.mesosphere.io/ in an > Ubuntu 14.04 container. CoreOS is at the latest stable (717.3.0) which gives > a Docker version at about 1.6.2. > I'm happy to provide more details if necessary. Cheers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3219) Slave recovery issues with Docker containerizer
[ https://issues.apache.org/jira/browse/MESOS-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008780#comment-15008780 ] Yong Tang commented on MESOS-3219: -- Hi [~tnachen] I encountered a similar issue when I tried to pass --docker_mesos_image=mesosphere/mesos-slave:0.25.0-0.2.70.ubuntu1404 which is the same as described by the email (CC [~gregory90]) thread in: https://www.mail-archive.com/user@mesos.apache.org/msg04975.html I am wondering if there is any progress with respect to this issue? I am happy to provide more details or helps if needed. Thanks > Slave recovery issues with Docker containerizer > --- > > Key: MESOS-3219 > URL: https://issues.apache.org/jira/browse/MESOS-3219 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Anderson >Assignee: Timothy Chen >Priority: Minor > > I'm working on setting up a Mesos environment with the > Docker containerizer and can't seem to get the recovery feature > working. I'm running CoreOS, so the slave processes themselves are > containerized. I have no issues running jobs without the recovery > features enabled, but all jobs fail to boot when I add the following > flags: > MESOS_DOCKER_KILL_ORPHANS=false > MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container > Inspecting the Docker images and their log output reveals that the > container invocation appears to be flawed - see this gist, which shows the > arguments as retrieved via `docker inspect` as well as the failed container's > log output: > https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b > The containerizer is attempting to invoke an unquoted command via > `/bin/sh -c`, which, predictably, fails to pass the complete command. > This results in the error message shown in the second file in the > linked gist. > This is reproducible manually; quoting the arguments to `/bin/sh -c` > results in success (at least, it correctly receives the supplied > arguments). > The slave container itself is not logging anything of interest. > It's possible that my instance is configured incorrectly as well; the > documentation here is a bit vague and there aren't many examples on the web. > I'm running Mesos 0.23.0 installed via http://repos.mesosphere.io/ in an > Ubuntu 14.04 container. CoreOS is at the latest stable (717.3.0) which gives > a Docker version at about 1.6.2. > I'm happy to provide more details if necessary. Cheers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3219) Slave recovery issues with Docker containerizer
[ https://issues.apache.org/jira/browse/MESOS-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008791#comment-15008791 ] Grzegorz Graczyk commented on MESOS-3219: - I've provided some more information about that issue here https://github.com/mesosphere/docker-containers/issues/6 . I think I've tried eveything, no one was able to help with this. It's really blocker for me for running Mesos on CoreOS/in container. > Slave recovery issues with Docker containerizer > --- > > Key: MESOS-3219 > URL: https://issues.apache.org/jira/browse/MESOS-3219 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Anderson >Assignee: Timothy Chen >Priority: Minor > > I'm working on setting up a Mesos environment with the > Docker containerizer and can't seem to get the recovery feature > working. I'm running CoreOS, so the slave processes themselves are > containerized. I have no issues running jobs without the recovery > features enabled, but all jobs fail to boot when I add the following > flags: > MESOS_DOCKER_KILL_ORPHANS=false > MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container > Inspecting the Docker images and their log output reveals that the > container invocation appears to be flawed - see this gist, which shows the > arguments as retrieved via `docker inspect` as well as the failed container's > log output: > https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b > The containerizer is attempting to invoke an unquoted command via > `/bin/sh -c`, which, predictably, fails to pass the complete command. > This results in the error message shown in the second file in the > linked gist. > This is reproducible manually; quoting the arguments to `/bin/sh -c` > results in success (at least, it correctly receives the supplied > arguments). > The slave container itself is not logging anything of interest. > It's possible that my instance is configured incorrectly as well; the > documentation here is a bit vague and there aren't many examples on the web. > I'm running Mesos 0.23.0 installed via http://repos.mesosphere.io/ in an > Ubuntu 14.04 container. CoreOS is at the latest stable (717.3.0) which gives > a Docker version at about 1.6.2. > I'm happy to provide more details if necessary. Cheers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)