Sergey Galkin created MESOS-4999:
------------------------------------

             Summary: Mesos (or Marathon) lost nodes
                 Key: MESOS-4999
                 URL: https://issues.apache.org/jira/browse/MESOS-4999
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 0.27.2
         Environment: mesos - 0.27.0
marathon - 0.15.2
189 mesos slaves with Ubuntu 14.04.2 on HP ProLiant DL380 Gen9,
CPU - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores (with 
hyperthreading))
RAM - 264G,
Storage - 3.0T on RAID on HP Smart Array P840 Controller,
HDD - 12 x HP EH0600JDYTL
Network - 2 x Intel Corporation Ethernet 10G 2P X710,
            Reporter: Sergey Galkin


After a lot of create/delete application  with docker instances  through 
Marathon API I have a lot of lost nodes after last *deleting all application in 
Marathon*.
They are divided into three types
1. Tasks hangs in STAGED status. I don't see this tasks in 'docker ps' on the 
slave and _service docker restart_ on mesos slave did not fix these tasks.
2. RUNNING because docker hangs and can't delete these instances  (a lot of 
{code}
Killing docker task
Shutting down
Killing docker task
Shutting down
{code}
 in stdout,  
_docker stop ID_ hangs and these tasks can be fixed by _service docker restart_ 
on mesos slave.
3. RUNNING after _service docker restart_ on mesos slave.

Screenshot attached 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to