Andrei Budnik created MESOS-10158:
-------------------------------------
Summary: Mesos Agent gets stuck in Draining due to pending
unacknowledged status updates
Key: MESOS-10158
URL: https://issues.apache.org/jira/browse/MESOS-10158
Project: Mesos
Issue Type: Bug
Components: master
Reporter: Andrei Budnik
A Mesos agent can get stuck in the Draining mode caused by pending
unacknowledged status updates. When the framework becomes disconnected, the
agent keeps sending task status updates for terminated tasks of that framework.
This leads to a problem when the agent gets stuck in the Draining state because
the master transitions the agent from DRAINING to DRAINED state only after all
task status updates get acknowledged.
This problem can be resolved by sending ["Teardown"
operation|https://github.com/apache/mesos/blob/8ce5d30808f3744eeded09d530f226079d569a94/include/mesos/v1/master/master.proto#L299-L303]
for all lost frameworks. However, it would be much better if this situation
could be handled automatically by the Master. At least, we should make it
easier for an operator to find out what prevents draining operation to complete.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)