[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sathish Kumar updated MESOS-6952: --------------------------------- Description: Task was stuck at staging state almost 6hours even after slave executor is terminated on the slave. Since the task was stuck at staging, framework have not received update from mesos-master. The issue got fixed after slave restart and the task was moved from staging to task lost state. I can see in the slave logs Asked to run task ' which is terminating/terminated {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' which is terminating/terminated {noformat} full Log of slave {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858762 107759 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.859004 107759 slave.cpp:1711] Queuing task 'ct:1484816820000:0:foocare_zendesk_round_robin:' for executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.939483 107759 slave.cpp:1863] Sending queued task 'ct:1484816820000:0:foocare_zendesk_round_robin:' to executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:16.141394 107762 slave.cpp:3871] Executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 exited with status 0 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:16.141451 107762 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from @0.0.0.0:0 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:16.141849 107762 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:16.141989 107762 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:16.147343 107766 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089175 107759 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' which is terminating/terminated mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097568 107774 slave.cpp:3012] Handling status update TASK_LOST (UUID: b999fb64-34f0-496d-be19-f5a7f998230e) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from @0.0.0.0:0 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097633 107774 slave.cpp:3975] Cleaning up executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097790 107772 gc.cpp:55] Scheduling '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b' for gc 6.99999886874074days in the future mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097836 107772 gc.cpp:55] Scheduling '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:' for gc 6.99999886832296days in the future mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097869 107772 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b' for gc 6.99999886819259days in the future mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097888 107772 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:' for gc 6.99999886809185days in the future mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.WARNING.20161004-154318.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' which is terminating/terminated {noformat} Master logs {noformat} was: Task was stuck at staging state almost 6hours even after slave executor is terminated on the slave. Since the task was stuck at staging, framework have not received update from mesos-master. The issue got fixed after slave restart and the task was moved from staging to task lost state. I can see in the slave logs Asked to run task ' which is terminating/terminated {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' which is terminating/terminated {noformat} full Log of slave {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858762 107759 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.859004 107759 slave.cpp:1711] Queuing task 'ct:1484816820000:0:foocare_zendesk_round_robin:' for executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.939483 107759 slave.cpp:1863] Sending queued task 'ct:1484816820000:0:foocare_zendesk_round_robin:' to executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:16.141394 107762 slave.cpp:3871] Executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 exited with status 0 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:16.141451 107762 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from @0.0.0.0:0 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:16.141849 107762 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:16.141989 107762 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:16.147343 107766 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089175 107759 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:1484816820000:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' which is terminating/terminated mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097568 107774 slave.cpp:3012] Handling status update TASK_LOST (UUID: b999fb64-34f0-496d-be19-f5a7f998230e) for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from @0.0.0.0:0 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097633 107774 slave.cpp:3975] Cleaning up executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097790 107772 gc.cpp:55] Scheduling '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b' for gc 6.99999886874074days in the future mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097836 107772 gc.cpp:55] Scheduling '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:' for gc 6.99999886832296days in the future mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097869 107772 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b' for gc 6.99999886819259days in the future mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097888 107772 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:' for gc 6.99999886809185days in the future mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.WARNING.20161004-154318.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:1484816820000:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:1484816820000:0:foocare_zendesk_round_robin:' which is terminating/terminated {noformat} > Mesos task state was stuck in staging even after executor terminated > -------------------------------------------------------------------- > > Key: MESOS-6952 > URL: https://issues.apache.org/jira/browse/MESOS-6952 > Project: Mesos > Issue Type: Bug > Components: executor > Affects Versions: 0.28.2 > Environment: ubuntu 14.04 > Reporter: Sathish Kumar > > Task was stuck at staging state almost 6hours even after slave executor is > terminated on the slave. Since the task was stuck at staging, framework have > not received update from mesos-master. > The issue got fixed after slave restart and the task was moved from staging > to task lost state. > I can see in the slave logs Asked to run task ' which is > terminating/terminated > {noformat} > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for > status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for > task ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097193 107774 slave.cpp:1361] Got assigned task > ct:1484816820000:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097453 107774 slave.cpp:1480] Launching task > ct:1484816820000:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 > 14:42:17.097527 107774 slave.cpp:1673] Asked to run task > 'ct:1484816820000:0:foocare_zendesk_round_robin:' for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor > 'ct:1484816820000:0:foocare_zendesk_round_robin:' which is > terminating/terminated > {noformat} > full Log of slave > {noformat} > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED > (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update > TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE > for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) > for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED > (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status > update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update > acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for > status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for > task ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.858510 107759 slave.cpp:1361] Got assigned task > ct:1484816820000:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.858762 107759 slave.cpp:1480] Launching task > ct:1484816820000:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.859004 107759 slave.cpp:1711] Queuing task > 'ct:1484816820000:0:foocare_zendesk_round_robin:' for executor > 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.939483 107759 slave.cpp:1863] Sending queued task > 'ct:1484816820000:0:foocare_zendesk_round_robin:' to executor > 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:16.141394 107762 slave.cpp:3871] Executor > 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 exited with status 0 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:16.141451 107762 slave.cpp:3012] Handling status update TASK_FAILED > (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task > ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from @0.0.0.0:0 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:16.141849 107762 status_update_manager.cpp:320] Received status update > TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task > ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:16.141989 107762 status_update_manager.cpp:824] Checkpointing UPDATE > for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) > for task ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:16.147343 107766 slave.cpp:3410] Forwarding the update TASK_FAILED > (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task > ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.089175 107759 status_update_manager.cpp:392] Received status update > acknowledgement (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task > ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for > status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for > task ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097193 107774 slave.cpp:1361] Got assigned task > ct:1484816820000:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097453 107774 slave.cpp:1480] Launching task > ct:1484816820000:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 > 14:42:17.097527 107774 slave.cpp:1673] Asked to run task > 'ct:1484816820000:0:foocare_zendesk_round_robin:' for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor > 'ct:1484816820000:0:foocare_zendesk_round_robin:' which is > terminating/terminated > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097568 107774 slave.cpp:3012] Handling status update TASK_LOST > (UUID: b999fb64-34f0-496d-be19-f5a7f998230e) for task > ct:1484816820000:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from @0.0.0.0:0 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097633 107774 slave.cpp:3975] Cleaning up executor > 'ct:1484816820000:0:foocare_zendesk_round_robin:' of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 at executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097790 107772 gc.cpp:55] Scheduling > '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b' > for gc 6.99999886874074days in the future > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097836 107772 gc.cpp:55] Scheduling > '/data/mesos/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:' > for gc 6.99999886832296days in the future > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097869 107772 gc.cpp:55] Scheduling > '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:/runs/6b8922ff-3f57-42a0-97d1-d79c1de3d93b' > for gc 6.99999886819259days in the future > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097888 107772 gc.cpp:55] Scheduling > '/data/mesos/meta/slaves/22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56/frameworks/19393553-2061-4d2f-8c05-a0ba688334f4-0001/executors/ct:1484816820000:0:foocare_zendesk_round_robin:' > for gc 6.99999886809185days in the future > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.WARNING.20161004-154318.107733:W0119 > 14:42:17.097527 107774 slave.cpp:1673] Asked to run task > 'ct:1484816820000:0:foocare_zendesk_round_robin:' for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor > 'ct:1484816820000:0:foocare_zendesk_round_robin:' which is > terminating/terminated > {noformat} > Master logs > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)