[jira] [Updated] (MESOS-1915) Docker containers that fail to launch are not killed

Timothy Chen (JIRA) Thu, 23 Oct 2014 10:54:46 -0700

     [ 
https://issues.apache.org/jira/browse/MESOS-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Timothy Chen updated MESOS-1915:
--------------------------------
    Target Version/s: 0.21.0

> Docker containers that fail to launch are not killed
> ----------------------------------------------------
>
>                 Key: MESOS-1915
>                 URL: https://issues.apache.org/jira/browse/MESOS-1915
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>    Affects Versions: 0.20.1
>         Environment: Mesos 0.20.1 using the docker executor with a private 
> docker repository. Images often take up to 5 minutes to launch.
> /etc/mesos-slave/executor_registration_timeout is set to '10mins'
>            Reporter: Daniel Hall
>            Assignee: Timothy Chen
>
> When we launch docker containers on our Mesos cluster using marathon we have 
> noticed that we end up with several docker containers running, with only one 
> of them actually being tracked my Mesos. When inspected the containers both 
> have the same start time.
> This seems to be because Mesos gives up on trying to start the container 
> after 1min, but fails to clean up the docker container because it is is not 
> yet running. Eventually the container starts alongside all the other attempts 
> mesos has made and we end up with several containers running with only one 
> being tracked by Mesos.
> I've pasted some logs from the slave below filter for that particular task, 
> but it is pretty easy to replicate in our environment so I'm happy to provide 
> further logs, details and analysis as required. This is becoming a bit 
> problem for us so we are happy to help as much as possible.
> {noformat}
> Oct 13 04:47:42 mesosslave-1 mesos-slave[16647]: I1013 04:47:42.776945 16661 
> docker.cpp:743] Starting container 'dd113461-4d18-4170-8e3f-9527e6d7f598' for 
> task 'docker-test.11588a48-5294-11e4-adea-42010af0f51e' (and executor 
> 'docker-test.11588a48-5294-11e4-adea-42010af0f51e') of framework 
> '20140918-022627-519434250-5050-6171-0000'
> Oct 13 04:48:42 mesosslave-1 mesos-slave[16647]: E1013 04:48:42.819563 16664 
> slave.cpp:2205] Failed to update resources for container 
> dd113461-4d18-4170-8e3f-9527e6d7f598 of executor 
> docker-test.11588a48-5294-11e4-adea-42010af0f51e running task 
> docker-test.11588a48-5294-11e4-adea-42010af0f51e on status update for 
> terminal task, destroying container: No container found
> Oct 13 04:49:29 mesosslave-1 mesos-slave[16647]: I1013 04:49:29.916460 16665 
> slave.cpp:2538] Monitoring executor 
> 'docker-test.11588a48-5294-11e4-adea-42010af0f51e' of framework 
> '20140918-022627-519434250-5050-6171-0000' in container 
> 'dd113461-4d18-4170-8e3f-9527e6d7f598'
> Oct 13 04:49:31 mesosslave-1 mesos-slave[16647]: I1013 04:49:31.103175 16663 
> docker.cpp:1286] Updated 'cpu.shares' to 102 at 
> /cgroup/cpu/docker/6a581f5c2174dc76bcfb2e5b89fd9a4310732c384d93901a8b37da8aeb700468
>  for container dd113461-4d18-4170-8e3f-9527e6d7f598
> Oct 13 04:49:31 mesosslave-1 mesos-slave[16647]: I1013 04:49:31.105036 16663 
> docker.cpp:1321] Updated 'memory.soft_limit_in_bytes' to 32MB for container 
> dd113461-4d18-4170-8e3f-9527e6d7f598
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-1915) Docker containers that fail to launch are not killed

Reply via email to