> On April 15, 2016, 12:49 a.m., Vinod Kone wrote: > > src/launcher/http_command_executor.cpp, line 749 > > <https://reviews.apache.org/r/46187/diff/1/?file=1343828#file1343828line749> > > > > Looking at slave::statusUpdate() code there are several scenarios where > > the slave ignores a status update sent by the executor; this means this > > executor could end up not terminating forever. > > > > Can you do the following: > > > > --> Enque a message in the queue to self terminate after some timeout > > (you can use the delay() function) to be safe. > > > > --> Add a TODO that we do this to be safe and also because slave > > sometimes doesn't ACK a status update. Link to a ticket that fixes the > > slave status update semantics to always ACK a status update sent by an > > executor. > > > > sounds good? > > Vinod Kone wrote: > @Qian, any update on this? If this particular review is going to take > some time, I think it is still useful two commit the other 2 reviews in this > chain. AFAICT, they are independent of this review?
@Vinod, sorry for the late. I have filed a ticket (https://issues.apache.org/jira/browse/MESOS-5262) for enhancing `slave::statusUpdate()` to always ACK the status update sent by executor. And can you please elaborate about the specific scenarios this executor could not terminate forever. Originially I thought the scenario should be: executor sends a terminal status upate to slave when the corresponding framework is in `TERMINATING` state (e.g., operator tears down the framework), then in `Slave::statusUpdate()`, this status update will be ignored, so the executor will not get the ACK. But after testing, I found in this case the executor can still terminate, because the container corresponded to this executor will be destroyed by `Slave::shutdownExecutorTimeout()` -> `MesosContainerizer::destroy()`, so after `--executor_shutdown_grace_period`, the executor can still terminate. So I am not in which case the executor will never terminate. And yes, the other 2 patches are independent of this one, I will make them not depending on this one in the review board, thanks! - Qian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46187/#review128916 ----------------------------------------------------------- On April 14, 2016, 1:17 p.m., Qian Zhang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/46187/ > ----------------------------------------------------------- > > (Updated April 14, 2016, 1:17 p.m.) > > > Review request for mesos, Anand Mazumdar and Vinod Kone. > > > Bugs: MESOS-3558 > https://issues.apache.org/jira/browse/MESOS-3558 > > > Repository: mesos > > > Description > ------- > > Terminate when receiving the ACK of terminal status update. > > > Diffs > ----- > > src/launcher/http_command_executor.cpp > ad484e0e6f5067b6c166111c91b1ff1e8c05d9ac > > Diff: https://reviews.apache.org/r/46187/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Qian Zhang > >