Question on status update retry in agent

Zhitao Li Thu, 15 Mar 2018 10:36:05 -0700

Hi,

While designing the correct behavior with one of our framework, we
encounters some questions about behavior of status update:


The executor continuously polls the workload probe to get current mode of
workload (a Cassandra server), and send various status update states
(STARTING, RUNNING, FAILED, etc).

Executor polls every 30 seconds, and sends status update. Here, we are
seeing congestion on task update acknowledgements somewhere (still unknown).

There are three scenarios that we want to understand.

   1. Agent queue has task update TS1, TS2 & TS3 (in this order) waiting on
   acknowledgement. Suppose if TS2 receives an acknowledgement, then what will
   happen to TS1 update in the queue.


   1. Agent queue has task update TS1, TS2, TS3 & TASK_FAILED. Here, TS1,
   TS2, TS3 are non-terminial updates. Once the agent has received a terminal
   status update, does it makes sense to ignore non-terminal updates in the
   queue?


   1. As per Executor Driver code comment
   
<https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/ExecutorDriver.java#L86>,
   if the executor is terminated, does agent send TASK_LOST? If so, does it
   send once or for each unacknowledged status update?


I'll study the code in status update manager and agent separately but some
official answer will definitely help.

Many thanks!

-- 
Cheers,

Zhitao Li

Question on status update retry in agent

Reply via email to