Re: Graceful Shutdown Design

Niklas Nielsen Wed, 12 Nov 2014 15:08:16 -0800

I thought signal escalation as per-executor or actually everywhere where we
execute a command info as a subprocess.
The new grace period is meant as the time an executor has to finish off
it's things - changing the other timeouts had to be done as they will in
most cases be shorter.
For custom executors, it is up to themselves to honor the timeout; or else,
the executor process will kill it after timeout + delta time.


Ben, are you thinking of a more generalized finalization mechanism
(pluggable, programmable)?

Niklas

On 11 November 2014 10:34, Alex Rukletsov <a...@mesosphere.io> wrote:

> Ben,
>
> there are two scenarios: executor shutdown and killTask() in
> CommandExecutor. For the first use case, each custom executor is affected
> through the ExecutorProcess, that means two levels are involved
> (containerizer and executor) and should be synchronized.
>
> In the second scenario, each task is tied to its own CommandExecutor,
> therefore killing a task implies killing its executor. In this case, grace
> shutdown period becomes also a signal escalation timeout and conflating
> them together, I think, is a good idea. The proposed design doc is an
> effort to align timeouts along the chain from slave to CommanExecutor.
>
> If I understand you correctly, we want to shutdown any executor (task)
> gracefully, and do not tie grace period to CommandExecutor only. A good
> example pointed by Ankur Chauhan is MESOS-1925
> <https://issues.apache.org/jira/browse/MESOS-1925>: we can reuse reuse the
> same grace shutdown flag for dockers. And if we later enable frameworks to
> adjust timeouts for its tasks (or executors, to be precise), we will be
> able to align the timeout used by docker finalization with the timeout in
> docker container.
>
> On Mon, Nov 10, 2014 at 10:00 PM, Benjamin Mahler <
> benjamin.mah...@gmail.com
> > wrote:
>
> > I'm guessing most of the motivation here is actually for task killing
> > escalation in the command executor? The shutdown grace period was
> designed
> > for executor shutdown only, which today occurs only when the framework is
> > being shutdown (or recovery is cleaning up), or in the future, when
> > frameworks ask to shutdown a specific executor.
> >
> > In the case of the command executor, the slave won't do any escalation
> when
> > a killTask arrives, since it's not trying to shutdown the executor. For
> > simplicity (I'm guessing), we conflated the executor shutdown grace
> period,
> > with the killTask signal escalation in the command executor.
> >
> > So, I'm still trying to figure out the concrete use case here, is it that
> > you have command-tasks that implement a clean shutdown driven by SIGTERM?
> > Going forward, is that enough or would we want a more general notion of
> > "Finalization" (e.g. driven by HTTP, or SIGTERM, or subprocess, etc),
> much
> > like the generic health checking that was added.
> >
> > On Mon, Nov 10, 2014 at 8:08 AM, Alex Rukletsov <a...@mesosphere.io>
> > wrote:
> >
> > > Hi all,
> > >
> > > I would like to share the design doc for configurable grace period
> > > <
> > >
> >
> https://docs.google.com/document/d/1_b3OPv3tjkub1T6VhQ27GnDfbVjnJ6IQ4ufPQhV1HM8/edit?usp=sharing
> > > >.
> > > The doc describes two approaches to calculate nested grace periods,
> > points
> > > out implementation details and opens several design questions.
> > >
> > > I would highly appreciate any thoughts, ideas and suggestions!
> > >
> > > Thanks,
> > > Alex
> > >
> >
>

Re: Graceful Shutdown Design

Reply via email to