> On Oct. 7, 2014, 6:50 p.m., Vinod Kone wrote:
> > OK. Here is a proposal for what it could look like.
> > 
> > 
> > General idea: We should add as few top level task states as possible 
> > because it is more work for frameworks. TASK_LOST should be used for cases 
> > where we expect a relaunch of the task would succeed (unfortunately this 
> > principle breaks with reconciliation).
> > 
> > 
> > Add 2 new task states to TaskState
> > enum TaskState {
> >   ...
> >   ...
> >   ...,
> >   TASK_UNAUTHORIZED,  # Fold this into TASK_INVALID?
> >   TASK_INVALID # Maybe use TASK_ERROR instead since it already exists but 
> > unused?
> > }
> > 
> > 
> > We add 2 new fields, "source" and "reason"/"code" both enums, to TaskStatus
> > 
> > NOTE: We should take this opportunity to move task validations from 
> > scheduler driver to master, to simplify. Maybe do this as first patch.
> > 
> > enum Source {
> >   MASTER,
> >   SLAVE,
> >   EXECUTOR,
> >   SCHEDULER, # Don't need this when we move validation to master.
> > }
> > 
> > Based on the different status updates, these are the reasons i came up 
> > with. Let me know if you can't figure out which reason should be used where 
> > :)
> > 
> > enum Reason {
> > # Set by master
> > INVALID_OFFERS,
> > SLAVE_REMOVED,
> > SLAVE_DISCONNECTED,
> > SLAVE_UKNOWN,
> > TASK_UNKNOWN,
> > 
> > # Set by scheduler driver for now. But we could kill this and expect 
> > scheduler to not send launch tasks when it is disconnected?
> > MASTER_DISCONNECTED,
> > 
> > # Set by slave
> > GC_ERROR,
> > SLAVE_RESTARTED
> > EXECUTOR_TERMINATED,
> > }
> > 
> > Currently the "Reason" make sense for LOST updates generated by 
> > master/slave. Executors might use this code for udpates they generate, but 
> > it is upto the framework on how to interpret it. We could also consider 
> > adding more reasons for TASK_INVALID/TASK_ERROR which is also generated by 
> > master (e.g., TASK_UNAUTHORIZED could be a reason for TASK_INVALID).
> 
> Bill Farner wrote:
>     This looks good; i have one addendum: frameworks must not be allowed to 
> set status update fields in ways that conflict with the master/slave.  i.e. 
> an executor should not be allowed to specify the `Source` (or if it does, 
> mesos should overwrite it).
> 
> Vinod Kone wrote:
>     yup. that definitely was on my mind :)

Looks good to me. We can also add `TASK_FAILED` reasons and `TASK_KILLED` 
explanations to the `Reason` enum. Generally, my proposal is to use `Reason` 
for all second-tier states.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26382/#review55668
-----------------------------------------------------------


On Oct. 9, 2014, 1:18 a.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26382/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2014, 1:18 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-1, MESOS-1830 and MESOS-343
>     https://issues.apache.org/jira/browse/MESOS-1
>     https://issues.apache.org/jira/browse/MESOS-1830
>     https://issues.apache.org/jira/browse/MESOS-343
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Annotating every TASK_LOST with comments to open discussion.
> 
> If we add a 'source' field and consider adding TASK_INVALID i think it adds 
> much more information. I don't think the metrics would have to change as the 
> source matches the source file, I think. Unless I missed a subtlety. Ie, some 
> of the master TASK_LOST could be set to slave source, but i think it's 
> debatable.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp f05275b00635cee82007ed851bba1cd30a7aa74f 
>   src/sched/sched.cpp a37ed3d2e11035650b9bf0440fb87f66669129d8 
>   src/scheduler/scheduler.cpp fb88a3e029e97ba33eae5d50503be5ed9c9533e6 
>   src/slave/slave.cpp e56dcbd80114730949a0d4b553470802a4d38281 
> 
> Diff: https://reviews.apache.org/r/26382/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>

Reply via email to