[jira] [Commented] (MESOS-3219) Slave recovery issues with Docker containerizer

2015-12-02 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036024#comment-15036024
 ] 

Yong Tang commented on MESOS-3219:
--

If your systems could assume that the container can always run, then a 
(partially) workaround is to have a shell script to constantly restart mesos 
slave in a loop within the container.

In this way, the shell script will be served as the foreground process so the 
container will not die.

If mesos slave process itself dies then at least the shell script will restart 
and recover correctly.

That obviously is not a complete solution but it may help in certain situations.

> Slave recovery issues with Docker containerizer
> ---
>
> Key: MESOS-3219
> URL: https://issues.apache.org/jira/browse/MESOS-3219
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Anderson
>Assignee: Timothy Chen
>Priority: Minor
>
> I'm working on setting up a Mesos environment with the
> Docker containerizer and can't seem to get the recovery feature
> working. I'm running CoreOS, so the slave processes themselves are
> containerized. I have no issues running jobs without the recovery
> features enabled, but all jobs fail to boot when I add the following
> flags:
> MESOS_DOCKER_KILL_ORPHANS=false
> MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container
> Inspecting the Docker images and their log output reveals that the
> container invocation appears to be flawed - see this gist, which shows the 
> arguments as retrieved via `docker inspect` as well as the failed container's 
> log output:
> https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b
> The containerizer is attempting to invoke an unquoted command via
> `/bin/sh -c`, which, predictably, fails to pass the complete command.
> This results in the error message shown in the second file in the
> linked gist.
> This is reproducible manually; quoting the arguments to `/bin/sh -c`
> results in success (at least, it correctly receives the supplied
> arguments).
> The slave container itself is not logging anything of interest.
> It's possible that my instance is configured incorrectly as well; the 
> documentation here is a bit vague and there aren't many examples on the web.
> I'm running Mesos 0.23.0 installed via http://repos.mesosphere.io/ in an 
> Ubuntu 14.04 container. CoreOS is at the latest stable (717.3.0) which gives 
> a Docker version at about 1.6.2.
> I'm happy to provide more details if necessary. Cheers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3219) Slave recovery issues with Docker containerizer

2015-11-17 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008780#comment-15008780
 ] 

Yong Tang commented on MESOS-3219:
--

Hi [~tnachen]

I encountered a similar issue when I tried to pass 
--docker_mesos_image=mesosphere/mesos-slave:0.25.0-0.2.70.ubuntu1404
which is the same as described by the email (CC [~gregory90]) thread in:
https://www.mail-archive.com/user@mesos.apache.org/msg04975.html

I am wondering if there is any progress with respect to this issue? I am happy 
to provide more details or helps if needed.

Thanks

> Slave recovery issues with Docker containerizer
> ---
>
> Key: MESOS-3219
> URL: https://issues.apache.org/jira/browse/MESOS-3219
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Anderson
>Assignee: Timothy Chen
>Priority: Minor
>
> I'm working on setting up a Mesos environment with the
> Docker containerizer and can't seem to get the recovery feature
> working. I'm running CoreOS, so the slave processes themselves are
> containerized. I have no issues running jobs without the recovery
> features enabled, but all jobs fail to boot when I add the following
> flags:
> MESOS_DOCKER_KILL_ORPHANS=false
> MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container
> Inspecting the Docker images and their log output reveals that the
> container invocation appears to be flawed - see this gist, which shows the 
> arguments as retrieved via `docker inspect` as well as the failed container's 
> log output:
> https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b
> The containerizer is attempting to invoke an unquoted command via
> `/bin/sh -c`, which, predictably, fails to pass the complete command.
> This results in the error message shown in the second file in the
> linked gist.
> This is reproducible manually; quoting the arguments to `/bin/sh -c`
> results in success (at least, it correctly receives the supplied
> arguments).
> The slave container itself is not logging anything of interest.
> It's possible that my instance is configured incorrectly as well; the 
> documentation here is a bit vague and there aren't many examples on the web.
> I'm running Mesos 0.23.0 installed via http://repos.mesosphere.io/ in an 
> Ubuntu 14.04 container. CoreOS is at the latest stable (717.3.0) which gives 
> a Docker version at about 1.6.2.
> I'm happy to provide more details if necessary. Cheers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3219) Slave recovery issues with Docker containerizer

2015-11-17 Thread Grzegorz Graczyk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008791#comment-15008791
 ] 

Grzegorz Graczyk commented on MESOS-3219:
-

I've provided some more information about that issue here 
https://github.com/mesosphere/docker-containers/issues/6 . I think I've tried 
eveything, no one was able to help with this. It's really blocker for me for 
running Mesos on CoreOS/in container.

> Slave recovery issues with Docker containerizer
> ---
>
> Key: MESOS-3219
> URL: https://issues.apache.org/jira/browse/MESOS-3219
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Anderson
>Assignee: Timothy Chen
>Priority: Minor
>
> I'm working on setting up a Mesos environment with the
> Docker containerizer and can't seem to get the recovery feature
> working. I'm running CoreOS, so the slave processes themselves are
> containerized. I have no issues running jobs without the recovery
> features enabled, but all jobs fail to boot when I add the following
> flags:
> MESOS_DOCKER_KILL_ORPHANS=false
> MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container
> Inspecting the Docker images and their log output reveals that the
> container invocation appears to be flawed - see this gist, which shows the 
> arguments as retrieved via `docker inspect` as well as the failed container's 
> log output:
> https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b
> The containerizer is attempting to invoke an unquoted command via
> `/bin/sh -c`, which, predictably, fails to pass the complete command.
> This results in the error message shown in the second file in the
> linked gist.
> This is reproducible manually; quoting the arguments to `/bin/sh -c`
> results in success (at least, it correctly receives the supplied
> arguments).
> The slave container itself is not logging anything of interest.
> It's possible that my instance is configured incorrectly as well; the 
> documentation here is a bit vague and there aren't many examples on the web.
> I'm running Mesos 0.23.0 installed via http://repos.mesosphere.io/ in an 
> Ubuntu 14.04 container. CoreOS is at the latest stable (717.3.0) which gives 
> a Docker version at about 1.6.2.
> I'm happy to provide more details if necessary. Cheers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)