> On Oct. 3, 2014, 5:25 p.m., Niklas Nielsen wrote: > > src/exec/exec.cpp, lines 75-76 > > <https://reviews.apache.org/r/25434/diff/4/?file=709150#file709150line75> > > > > Why Java executors? I am not sure I understand this comment.
This is what BenH pointed me at. Since Java has garbage collection, an Executor written in java may be garbage collected during the graceful shutdown period. We spawn a separate process exactly to target this problem. > On Oct. 3, 2014, 5:25 p.m., Niklas Nielsen wrote: > > src/slave/constants.hpp, lines 51-59 > > <https://reviews.apache.org/r/25434/diff/4/?file=709152#file709152line51> > > > > Can we add something similar to the flags help text? The flag is > > already a bit misleading, so it would be helpful to expand on the new > > funtionality there. Agreed, definitely makes sense. > On Oct. 3, 2014, 5:25 p.m., Niklas Nielsen wrote: > > src/slave/utils.hpp, lines 29-34 > > <https://reviews.apache.org/r/25434/diff/4/?file=709160#file709160line29> > > > > Can you include an example? How about the sequence charts from the > > JIRA? The logic of adjustShutdownTimeout is still not crystal clear to me. > > To put it in another way; it is hard for me to tell whether we cover > > all corner cases. The approach I took gives two guarantees: a timeout is always nonnegative and not greater than the parent one. We can get a situation when the timeout at one or more levels is ridiculously small, which effectively disables the graceful shutdown. Do you have any specific concerns? I have added the sequence chart and expanded the comment a bit. > On Oct. 3, 2014, 5:25 p.m., Niklas Nielsen wrote: > > src/slave/utils.hpp, line 35 > > <https://reviews.apache.org/r/25434/diff/4/?file=709160#file709160line35> > > > > How about 'getShutdownTimeout'? It is not really adjusting, but rather > > partition the total timeout into sub-timeouts - correct? > > > > Also, we can hide the implementation adjustShutdownTimeout in utils.cpp > > so we only export adjustExecutorShutdownTimeout and > > adjustCommandExecutorShutdownTimeout (which could be called > > 'getExecutorShutdownTimeout'?) 'getShutdownTimeout' implies the timeout is not calculated on the fly based on input parameters. 'getShutdownTimeoutForLevel' is better, but too long. How about 'calculateShutdownTimeout' for the basic function and 'get...' for the specific ones? I would also leave the basic function publicly accessible for potential use in other places and levels. > On Oct. 3, 2014, 5:25 p.m., Niklas Nielsen wrote: > > src/slave/utils.cpp, line 42 > > <https://reviews.apache.org/r/25434/diff/4/?file=709161#file709161line42> > > > > Do we want to have checks/asserts to ensure we have positive/non-zero > > timeouts? I think currently it's difficult to supply a negative duration, but agree, let's do it for the future. > On Oct. 3, 2014, 5:25 p.m., Niklas Nielsen wrote: > > src/tests/mesos.hpp, lines 865-868 > > <https://reviews.apache.org/r/25434/diff/4/?file=709163#file709163line865> > > > > You only use this in MesosExecutorForceShutdown - how about moving it > > to slave_tests.cpp instead? > > > > How about using 'statusMatchesTask' or 'statusHasTaskId'. 'relatedTo' > > seems a bit informal/imprecise. Changed name to `statusMatchesTask ()`. We do not have utility functions in `slave_tests.cpp`, and the predicate is rather generic, so I would leave it in the `tests/mesos.hpp` where it can be found faster by the next one who needs the same functionality. I remove this test from this request and move it to the newly created [JIRA ticket](https://issues.apache.org/jira/browse/MESOS-1871). - Alexander ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25434/#review55348 ----------------------------------------------------------- On Oct. 2, 2014, 7:17 a.m., Alexander Rukletsov wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25434/ > ----------------------------------------------------------- > > (Updated Oct. 2, 2014, 7:17 a.m.) > > > Review request for mesos, Benjamin Hindman, Niklas Nielsen, Till Toenshoff, > and Timothy St. Clair. > > > Bugs: MESOS-1571 > https://issues.apache.org/jira/browse/MESOS-1571 > > > Repository: mesos-git > > > Description > ------- > > The configurable slave's executor_shutdown_grace_period flag is propagated to > Executor and CommandExecutor through an environment variable. Shutdown > timeout in Executor and signal escalation timeout in CommandExecutor are now > dependent on this flag. Each nested timeout is somewhat shorter than the > parent one. > > > Diffs > ----- > > src/Makefile.am 27c42df > src/exec/exec.cpp e15f834 > src/launcher/executor.cpp cbc8750 > src/slave/constants.hpp 9030871 > src/slave/constants.cpp e1da5c0 > src/slave/containerizer/containerizer.hpp 8a66412 > src/slave/containerizer/containerizer.cpp 0254679 > src/slave/containerizer/docker.cpp 9a29489 > src/slave/containerizer/external_containerizer.cpp efbc68f > src/slave/containerizer/mesos/containerizer.cpp 9d08329 > src/slave/flags.hpp 32e51d2 > src/slave/utils.hpp PRE-CREATION > src/slave/utils.cpp PRE-CREATION > src/tests/containerizer.cpp a17e1e0 > src/tests/mesos.hpp 957e223 > src/tests/slave_tests.cpp 69be28f > > Diff: https://reviews.apache.org/r/25434/diff/ > > > Testing > ------- > > make check (OS X 10.9.4) > > WIP: digging into the failure of the SlaveTest.MesosExecutorForceShutdown > test revealed an issue with signal escalation in CommandExecutor. That needs > more time to be resolved. > > > Thanks, > > Alexander Rukletsov > >
