[ https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971789#comment-14971789 ]
Niklas Quarfot Nielsen commented on MESOS-3766: ----------------------------------------------- Thanks [~anandmazumdar]! [~matth...@mesosphere.io] - I haven't been able to repro yet. How many slaves where you running? Is it mesos-local? Can you repro easily (and maybe enable verbose logging)? [~anandmazumdar] - do you have time to take this one on? > Can not kill task in Status STAGING > ----------------------------------- > > Key: MESOS-3766 > URL: https://issues.apache.org/jira/browse/MESOS-3766 > Project: Mesos > Issue Type: Bug > Components: general > Affects Versions: 0.25.0 > Environment: OSX > Reporter: Matthias Veit > Assignee: Niklas Quarfot Nielsen > Attachments: master.log.zip, slave.log.zip > > > I have created a simple Marathon Application with instance count 100 (100 > tasks) with a simple sleep command. Before all tasks were running, I killed > all tasks. This operation was successful, except 2 tasks. These 2 tasks are > in state STAGING (according to the mesos UI). Marathon tries to kill those > tasks every 5 seconds (for over an hour now) - unsuccessfully. > I picked one task and grepped the slave log: > {noformat} > I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 with resour > I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80 > I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container > '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr > I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing > executor's forked pid 37096 to > '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks > I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000 > I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame > I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > I1020 12:41:23.337116 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > . > . > . > I1020 14:11:03.614157 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 > {noformat} > master log looks like this: > {noformat} > I1020 12:39:38.044208 351387648 master.hpp:176] Adding task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d with resources cpus(*):0.1; > mem(*):16; ports(*):[31232-31232] on slave 80 > I1020 12:39:38.044494 351387648 master.cpp:3248] Launching task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 (marathon) at > I1020 12:40:13.061883 350314496 master.cpp:3482] Telling slave > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 > (localhost) to kill task app.dc98434b-7716-1 > I1020 12:40:18.079074 351387648 master.cpp:3482] Telling slave > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 > (localhost) to kill task app.dc98434b-7716-1 > I1020 12:40:23.097110 352460800 master.cpp:3482] Telling slave > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 > (localhost) to kill task app.dc98434b-7716-1 > I1020 12:40:28.117952 352997376 master.cpp:3482] Telling slave > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 > (localhost) to kill task app.dc98434b-7716-1 > I1020 12:40:33.137667 352460800 master.cpp:3482] Telling slave > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 > (localhost) to kill task app.dc98434b-7716-1 > I1020 12:40:38.157832 354070528 master.cpp:3482] Telling slave > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 > (localhost) to kill task app.dc98434b-7716-1 > I1020 12:40:43.177223 353533952 master.cpp:3482] Telling slave > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 > (localhost) to kill task app.dc98434b-7716-1 > . > . > . > I1020 14:11:33.611827 353533952 master.cpp:3482] Telling slave > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 > (localhost) to kill task app.dc98434b-7716-1 > {noformat} > The sandbox: stdout is empty and stderr has following content: > {noformat} > I1020 12:39:41.551882 2047558400 exec.cpp:134] Version: 0.25.0 > {noformat} > Just for reference, this was the Marathon Application used: > {noformat} > { > "id": "/app", > "mem": 16.0, > "cmd": "sleep 10000", > "cpus": 0.1, > "disk": 0.0, > "env": { > "foo": "bla" > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)