Sergey Galkin created MESOS-4999: ------------------------------------ Summary: Mesos (or Marathon) lost nodes Key: MESOS-4999 URL: https://issues.apache.org/jira/browse/MESOS-4999 Project: Mesos Issue Type: Bug Affects Versions: 0.27.2 Environment: mesos - 0.27.0 marathon - 0.15.2 189 mesos slaves with Ubuntu 14.04.2 on HP ProLiant DL380 Gen9, CPU - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores (with hyperthreading)) RAM - 264G, Storage - 3.0T on RAID on HP Smart Array P840 Controller, HDD - 12 x HP EH0600JDYTL Network - 2 x Intel Corporation Ethernet 10G 2P X710, Reporter: Sergey Galkin
After a lot of create/delete application with docker instances through Marathon API I have a lot of lost nodes after last *deleting all application in Marathon*. They are divided into three types 1. Tasks hangs in STAGED status. I don't see this tasks in 'docker ps' on the slave and _service docker restart_ on mesos slave did not fix these tasks. 2. RUNNING because docker hangs and can't delete these instances (a lot of {code} Killing docker task Shutting down Killing docker task Shutting down {code} in stdout, _docker stop ID_ hangs and these tasks can be fixed by _service docker restart_ on mesos slave. 3. RUNNING after _service docker restart_ on mesos slave. Screenshot attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)