> On Sept. 6, 2016, 11:33 a.m., Vinod Kone wrote:
> > src/launcher/executor.cpp, line 682
> > <https://reviews.apache.org/r/46187/diff/5/?file=1488422#file1488422line682>
> >
> >     hmm. what's the guarantee that an HTTP based executor receives an ACK 
> > within a second? what if the agent is down?
> 
> Qian Zhang wrote:
>     If executor does not receives an ACK from agent within 1 second, that 
> means there should be something wrong in agent, so with this code, the 
> executor will exit with -1 and a message so that we can catch such situation. 
> Maybe we should enlarge this timeout a bit (e.g., 3 seconds) to be safer?

Executor exiting with -1 code when an agent is down or restarting (probably for 
an upgrade) seems unfortunate. Since we allow agents to be down for upto 
"MESOS_RECOVERY_TIMEOUT" (default 15 mins) if "MESOS_CHECKPOINT" is set, maybe 
the command executor could wait for "MESOS_RECOVERY_TIMEOUT" if it is 
disconnected? If it is connected then yes, it probably should wait for less 
time (1s is too short?) and then exit because that's seems like there is 
something wrong with the agent. Does that make sense?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review147808
-----------------------------------------------------------


On Sept. 7, 2016, 2:03 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> -----------------------------------------------------------
> 
> (Updated Sept. 7, 2016, 2:03 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-5276
>     https://issues.apache.org/jira/browse/MESOS-5276
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -----
> 
>   src/launcher/executor.cpp 5370634ef9e6f3ac9717fed71f6a77707026a16a 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>

Reply via email to