Re: API review: max_duration on TaskInfo

Zhitao Li Wed, 28 Mar 2018 09:17:39 -0700

A quick update: James and I think the name `max_completion_time` is a bit
better than `max_duration`. Semantic should remain the same.


On Mon, Mar 26, 2018 at 9:52 AM, Zhitao Li <zhitaoli...@gmail.com> wrote:

> Hi Benjamin,
>
> James and I did some quick search about some existing systems. We can dig
> deep into their semantic.
>
> Kubernetes has a feature called activeDeadlineSeconds
> <https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/>,
> although it seems to be total scheduling time rather than container run
> time (which makes sense since K8s itself is end to end scheduler).
>
> The BMC server Automation system has something similar to above called
> JOB_TIMEOUT
> <https://docs.bmc.com/docs/ServerAutomation/86/using/managing-jobs/defining-timeouts-for-jobs>
> .
>
> YARN/Hadoop defined a couple of configurations suffixed with timeout
> (`mapreduce.task.timeout` and related ones in this doc
> <https://hadoop.apache.org/docs/r2.4.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml>)
> although they also seem to model some aspects of "health check". There is
> also some research work about deadline scheduler
> <https://www.researchgate.net/publication/267752723_Hadoop_Scheduler_with_Deadline_Constraint>
> in Hadoop but I have not realized whether the work is translated to open
> source implementation.
>
> Wrapping a `timeout` command is definitely one possibility, but it seems a
> bit hacky to me and also lacked proper reporting and tracking. If we could
> support this feature w/o too much complexity I think it's still attractive.
>
> Please let me know your opinion. Thanks.
>
> On Fri, Mar 23, 2018 at 3:33 PM, Benjamin Mahler <bmah...@apache.org>
> wrote:
>
>> In the interest of doing our due diligence, have you studied any prior
>> art?
>>
>> For example, I was surprised to notice that htcondor doesn't really
>> provide
>> this as a first class thing:
>> https://lists.cs.wisc.edu/archive/htcondor-users/2006-
>> November/msg00024.shtml
>>
>> I didn't see it in any other systems I looked at either, with people
>> suggesting wrapping commands with the 'timeout' command. I suspect most
>> systems have the user do this on their own with a simple timeout wrapper
>> script?
>>
>> On Fri, Mar 23, 2018 at 2:21 PM, Zhitao Li <zhitaoli...@gmail.com> wrote:
>>
>> > Hi everyone,
>> >
>> > I'd like to do an API review for MESOS-8725
>> > <https://issues.apache.org/jira/browse/MESOS-8725>. We are adding an
>> > optional `max_duration` to `TaskInfo` field. If a task does not
>> terminate
>> > within this duration, built-in executors will kill the task with a new
>> > reason `REASON_MAX_DURATION_REACHED`.
>> >
>> > Proof of concept patch:
>> > https://reviews.apache.org/r/66258/
>> >
>> > Reference implementation in command executor:
>> > https://reviews.apache.org/r/66259/
>> >
>> > A design choice we made is to make this relative duration rather than an
>> > absolute timestamp of deadline. Our rationales:
>> >
>> >    - Cluster could suffer from clock skews, so same absolute deadline
>> would
>> >    result in inconsistent behavior;
>> >    - Framework can just trivially translate its own clock as source of
>> >    truth to translate absolute deadline to current time + max_duration.
>> >
>> > Please let me know what you think. Thanks.
>> >
>> > --
>> > Cheers,
>> >
>> > Zhitao Li
>> >
>>
>
>
>
> --
> Cheers,
>
> Zhitao Li
>



-- 
Cheers,

Zhitao Li

Re: API review: max_duration on TaskInfo

Reply via email to