[ https://issues.apache.org/jira/browse/MESOS-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494051#comment-15494051 ]
Joseph Wu commented on MESOS-6026: ---------------------------------- There shouldn't be a problem with backporting this. (As long as you also cherry-pick the review preceding r51404.) We had this cherry-picked onto a branch of 1.0.0 for an internal test cluster of DC/OS. We had this deployed for about 2 weeks, and monitored some of the example frameworks during that time. > Tasks mistakenly marked as FAILED due to race b/w > sendExecutorTerminatedStatusUpdate() and _statusUpdate() > ------------------------------------------------------------------------------------------------------------------ > > Key: MESOS-6026 > URL: https://issues.apache.org/jira/browse/MESOS-6026 > Project: Mesos > Issue Type: Bug > Components: slave > Reporter: Kapil Arya > Assignee: Benjamin Mahler > Labels: mesosphere > Fix For: 1.1.0 > > > Due to a race between sendExecutorTerminatedStatusUpdate() and > _statusUpdate() that happens when the task has just finished and the > executor is exiting. > Here is an example of slave log messages: > {code} > Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959374 > 20418 slave.cpp:3211] Handling status update TASK_FINISHED (UUID: > fd79d0bd-4ece-41dc-bced-b93491f6bb2e) for task 291 of framework > 340dfe26-a09f-4857-85b8-faba5f8d95df-0008 from executor(1)@10.10.0.205:53504 > Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959604 > 20418 slave.cpp:3732] executor(1)@10.10.0.205:53504 exited > Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959643 > 20418 slave.cpp:4089] Executor '291' of framework > 340dfe26-a09f-4857-85b8-faba5f8d95df-0008 exited with status 0 > Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959744 > 20418 slave.cpp:3211] Handling status update TASK_FAILED (UUID: > b94722fb-1658-4936-b604-6d642ffe20a0) for task 291 of framework > 340dfe26-a09f-4857-85b8-faba5f8d95df-0008 from @0.0.0.0:0 > {code} > As can be noticed, the task is marked as TASK_FAILED after the executor has > exited. -- This message was sent by Atlassian JIRA (v6.3.4#6332)