Hi, We are running into an issue with slave status update manager. Below is the behavior I am seeing.
Our use case is, we run Stateful container (Cassandra process), here Executor polls JMX port at 60 second interval to get Cassandra State and sends the state to agent -> master -> framework. *RUNNING Cassandra Process translates to TASK_RUNNING.* *CRASHED or DRAINED Cassandra Process translates to TASK_FAILED.* At some point slave has multiple TASK_RUNNING status updates in stream and then followed by TASK_FAILED if acknowledgements are pending. We use explicit acknowledgements, and I see Mesos Master receives, all TASK_RUNNING and then TASK_FAILED as well as Framework also receives all TASK_RUNNING updates followed up TASK_FAILED. After receiving TASK_FAILED, Framework restarts different executor on same machine using old persistent volume. Issue is that, *old executor reference is hold by slave* (assuming it did not receive acknowledgement, whereas master and scheduler have processed the status updates), so it continues to retry TASK_RUNNING infinitely. Here, old executor process is not running. As well as new executor process is running, and continues to work as-is. This makes be believe, some bug with slave status update manager. I read slave status update manager code, recover <https://github.com/apache/mesos/blob/master/src/slave/task_status_update_manager.cpp#L203> has a constraint <https://github.com/apache/mesos/blob/master/src/slave/task_status_update_manager.cpp#L239> to ignore status updates from stream if the last executor run is completed. I think, similar constraint should be applicable for status update <https://github.com/apache/mesos/blob/master/src/slave/task_status_update_manager.cpp#L318> and acknowledge <https://github.com/apache/mesos/blob/master/src/slave/task_status_update_manager.cpp#L760> . Thanks, Varun On Fri, Mar 16, 2018 at 7:47 PM, Benjamin Mahler <bmah...@apache.org> wrote: > (1) Assuming you're referring to the scheduler's acknowledgement of a > status update, the agent will not forward TS2 until TS1 has been > acknowledged. So, TS2 will not be acknowledged before TS1 is acknowledged. > FWICT, we'll ignore any violation of this ordering and log a warning. > > (2) To reverse the question, why would it make sense to ignore them? > Assuming you're looking to reduce the number of round trips needed for > schedulers to see the terminal update, I would point you to: > https://issues.apache.org/jira/browse/MESOS-6941 > > (3) When the agent sees an executor terminate, it will transition all > non-terminal tasks assigned to that executor to TASK_GONE (partition aware > framework) or TASK_LOST (non partition aware framework) or TASK_FAILED (if > the container OOMed). There may be other cases, it looks a bit convoluted > to me. > > On Thu, Mar 15, 2018 at 10:35 AM, Zhitao Li <zhitaoli...@gmail.com> wrote: > > > Hi, > > > > While designing the correct behavior with one of our framework, we > > encounters some questions about behavior of status update: > > > > The executor continuously polls the workload probe to get current mode of > > workload (a Cassandra server), and send various status update states > > (STARTING, RUNNING, FAILED, etc). > > > > Executor polls every 30 seconds, and sends status update. Here, we are > > seeing congestion on task update acknowledgements somewhere (still > > unknown). > > > > There are three scenarios that we want to understand. > > > > 1. Agent queue has task update TS1, TS2 & TS3 (in this order) waiting > on > > acknowledgement. Suppose if TS2 receives an acknowledgement, then what > > will > > happen to TS1 update in the queue. > > > > > > 1. Agent queue has task update TS1, TS2, TS3 & TASK_FAILED. Here, TS1, > > TS2, TS3 are non-terminial updates. Once the agent has received a > > terminal > > status update, does it makes sense to ignore non-terminal updates in > the > > queue? > > > > > > 1. As per Executor Driver code comment > > <https://github.com/apache/mesos/blob/master/src/java/src/ > > org/apache/mesos/ExecutorDriver.java#L86>, > > if the executor is terminated, does agent send TASK_LOST? If so, does > it > > send once or for each unacknowledged status update? > > > > > > I'll study the code in status update manager and agent separately but > some > > official answer will definitely help. > > > > Many thanks! > > > > -- > > Cheers, > > > > Zhitao Li > > >