[ https://issues.apache.org/jira/browse/MESOS-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206449#comment-15206449 ]
Sergey Galkin edited comment on MESOS-4999 at 3/22/16 2:34 PM: --------------------------------------------------------------- I see this tasks in Marathon tasks through Marathon API {code} ╰─➤ curl -v http://172.20.8.34:8080/env-testing/marathon/v2/tasks * Trying 172.20.8.34... * Connected to 172.20.8.34 (172.20.8.34) port 8080 (#0) > GET /env-testing/marathon/v2/tasks HTTP/1.1 > Host: 172.20.8.34:8080 > User-Agent: curl/7.47.1 > Accept: */* > < HTTP/1.1 200 OK < Server: nginx/1.4.6 (Ubuntu) < Date: Tue, 22 Mar 2016 14:29:28 GMT < Content-Type: application/json; qs=2 < Transfer-Encoding: chunked < Connection: keep-alive < Cache-Control: no-cache, no-store, must-revalidate < X-Marathon-Leader: http://172.20.9.51:8080 < Expires: 0 < Pragma: no-cache < X-Marathon-Via: 1.1 172.20.9.52:8080 < {"tasks":[{"id":"66e562a95c285ce39f37693061a46c2e.394f52eb-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.62","ipAddresses":[],"ports":[8985],"startedAt":"2016-03-22T14:20:27.895Z","stagedAt":"2016-03-22T14:20:26.999Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S12","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394e6888-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.184","ipAddresses":[],"ports":[31332],"startedAt":"2016-03-22T14:20:27.920Z","stagedAt":"2016-03-22T14:20:26.993Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S165","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.3950163c-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.159","ipAddresses":[],"ports":[26568],"startedAt":"2016-03-22T14:20:27.923Z","stagedAt":"2016-03-22T14:20:27.004Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S85","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394eddb9-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.217","ipAddresses":[],"ports":[33140],"startedAt":"2016-03-22T14:20:27.901Z","stagedAt":"2016-03-22T14:20:26.994Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S144","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394e1a67-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.56","ipAddresses":[],"ports":[31705],"startedAt":"2016-03-22T14:20:27.947Z","stagedAt":"2016-03-22T14:20:26.990Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S5","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.3c3e5c8d-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.71","ipAddresses":[],"ports":[35903],"startedAt":"2016-03-22T14:20:32.923Z","stagedAt":"2016-03-22T14:20:31.920Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S30","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394df356-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.221","ipAddresses":[],"ports":[680],"startedAt":"2016-03-22T14:20:27.944Z","stagedAt":"2016-03-22T14:20:26.988Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S80","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394d7e24-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.102","ipAddresses":[],"ports":[20141],"startedAt":"2016-03-22T14:20:27.923Z","stagedAt":"2016-03-22T14:20:26.986Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S41","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394dcc45-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.165","ipAddresses":[],"ports":[16210],"startedAt":"2016-03-22T14:20:27.921Z","stagedAt":"2016-03-22T14:20:26.987Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S136","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394f04ca-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.107","ipAddresses":[],"ports":[11494],"startedAt":"2016-03-22T14:20:27.964Z","stagedAt":"2016-03-22T14:20:26.996Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S146","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]}]} {code} was (Author: sergeygals): I see this tasks in Marathon tasks through Marathon API {code} ╰─➤ curl -v http://172.20.8.34:8080/env-testing/marathon/v2/tasks * Trying 172.20.8.34... * Connected to 172.20.8.34 (172.20.8.34) port 8080 (#0) > GET /env-testing/marathon/v2/tasks HTTP/1.1 > Host: 172.20.8.34:8080 > User-Agent: curl/7.47.1 > Accept: */* > < HTTP/1.1 200 OK < Server: nginx/1.4.6 (Ubuntu) < Date: Tue, 22 Mar 2016 14:29:28 GMT < Content-Type: application/json; qs=2 < Transfer-Encoding: chunked < Connection: keep-alive < Cache-Control: no-cache, no-store, must-revalidate < X-Marathon-Leader: http://172.20.9.51:8080 < Expires: 0 < Pragma: no-cache < X-Marathon-Via: 1.1 172.20.9.52:8080 < {"tasks":[{"id":"66e562a95c285ce39f37693061a46c2e.394f52eb-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.62","ipAddresses":[],"ports":[8985],"startedAt":"2016-03-22T14:20:27.895Z","stagedAt":"2016-03-22T14:20:26.999Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S12","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394e6888-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.184","ipAddresses":[],"ports":[31332],"startedAt":"2016-03-22T14:20:27.920Z","stagedAt":"2016-03-22T14:20:26.993Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S165","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.3950163c-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.159","ipAddresses":[],"ports":[26568],"startedAt":"2016-03-22T14:20:27.923Z","stagedAt":"2016-03-22T14:20:27.004Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S85","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394eddb9-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.217","ipAddresses":[],"ports":[33140],"startedAt":"2016-03-22T14:20:27.901Z","stagedAt":"2016-03-22T14:20:26.994Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S144","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394e1a67-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.56","ipAddresses":[],"ports":[31705],"startedAt":"2016-03-22T14:20:27.947Z","stagedAt":"2016-03-22T14:20:26.990Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S5","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.3c3e5c8d-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.71","ipAddresses":[],"ports":[35903],"startedAt":"2016-03-22T14:20:32.923Z","stagedAt":"2016-03-22T14:20:31.920Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S30","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394df356-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.221","ipAddresses":[],"ports":[680],"startedAt":"2016-03-22T14:20:27.944Z","stagedAt":"2016-03-22T14:20:26.988Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S80","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394d7e24-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.102","ipAddresses":[],"ports":[20141],"startedAt":"2016-03-22T14:20:27.923Z","stagedAt":"2016-03-22T14:20:26.986Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S41","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394dcc45-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.165","ipAddresses":[],"ports* Connection #0 to host 172.20.8.34 left intact ":[16210],"startedAt":"2016-03-22T14:20:27.921Z","stagedAt":"2016-03-22T14:20:26.987Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S136","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]},{"id":"66e562a95c285ce39f37693061a46c2e.394f04ca-f039-11e5-ace3-3cfdfe9c6364","host":"172.20.9.107","ipAddresses":[],"ports":[11494],"startedAt":"2016-03-22T14:20:27.964Z","stagedAt":"2016-03-22T14:20:26.996Z","version":"2016-03-17T14:36:47.536Z","slaveId":"5445dbdc-c58a-4f78-aef2-9ab129a640fa-S146","appId":"/66e562a95c285ce39f37693061a46c2e","servicePorts":[10498]}]} {code} > Mesos (or Marathon) lost tasks > ------------------------------ > > Key: MESOS-4999 > URL: https://issues.apache.org/jira/browse/MESOS-4999 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.27.2 > Environment: mesos - 0.27.0 > marathon - 0.15.2 > 189 mesos slaves with Ubuntu 14.04.2 on HP ProLiant DL380 Gen9, > CPU - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores (with > hyperthreading)) > RAM - 264G, > Storage - 3.0T on RAID on HP Smart Array P840 Controller, > HDD - 12 x HP EH0600JDYTL > Network - 2 x Intel Corporation Ethernet 10G 2P X710, > Reporter: Sergey Galkin > Attachments: mesos-nodes.png > > > After a lot of create/delete application with docker instances through > Marathon API I have a lot of lost tasks after last *deleting all application > in Marathon*. > They are divided into three types > 1. Tasks hangs in STAGED status. I don't see this tasks in 'docker ps' on the > slave and _service docker restart_ on mesos slave did not fix these tasks. > 2. RUNNING because docker hangs and can't delete these instances (a lot of > {code} > Killing docker task > Shutting down > Killing docker task > Shutting down > {code} > in stdout, > _docker stop ID_ hangs and these tasks can be fixed by _service docker > restart_ on mesos slave. > 3. RUNNING after _service docker restart_ on mesos slave. > Screenshot attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)