> On Oct. 7, 2014, 6:50 p.m., Vinod Kone wrote: > > OK. Here is a proposal for what it could look like. > > > > > > General idea: We should add as few top level task states as possible > > because it is more work for frameworks. TASK_LOST should be used for cases > > where we expect a relaunch of the task would succeed (unfortunately this > > principle breaks with reconciliation). > > > > > > Add 2 new task states to TaskState > > enum TaskState { > > ... > > ... > > ..., > > TASK_UNAUTHORIZED, # Fold this into TASK_INVALID? > > TASK_INVALID # Maybe use TASK_ERROR instead since it already exists but > > unused? > > } > > > > > > We add 2 new fields, "source" and "reason"/"code" both enums, to TaskStatus > > > > NOTE: We should take this opportunity to move task validations from > > scheduler driver to master, to simplify. Maybe do this as first patch. > > > > enum Source { > > MASTER, > > SLAVE, > > EXECUTOR, > > SCHEDULER, # Don't need this when we move validation to master. > > } > > > > Based on the different status updates, these are the reasons i came up > > with. Let me know if you can't figure out which reason should be used where > > :) > > > > enum Reason { > > # Set by master > > INVALID_OFFERS, > > SLAVE_REMOVED, > > SLAVE_DISCONNECTED, > > SLAVE_UKNOWN, > > TASK_UNKNOWN, > > > > # Set by scheduler driver for now. But we could kill this and expect > > scheduler to not send launch tasks when it is disconnected? > > MASTER_DISCONNECTED, > > > > # Set by slave > > GC_ERROR, > > SLAVE_RESTARTED > > EXECUTOR_TERMINATED, > > } > > > > Currently the "Reason" make sense for LOST updates generated by > > master/slave. Executors might use this code for udpates they generate, but > > it is upto the framework on how to interpret it. We could also consider > > adding more reasons for TASK_INVALID/TASK_ERROR which is also generated by > > master (e.g., TASK_UNAUTHORIZED could be a reason for TASK_INVALID). > > Bill Farner wrote: > This looks good; i have one addendum: frameworks must not be allowed to > set status update fields in ways that conflict with the master/slave. i.e. > an executor should not be allowed to specify the `Source` (or if it does, > mesos should overwrite it). > > Vinod Kone wrote: > yup. that definitely was on my mind :)
Looks good to me. We can also add `TASK_FAILED` reasons and `TASK_KILLED` explanations to the `Reason` enum. Generally, my proposal is to use `Reason` for all second-tier states. - Alexander ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26382/#review55668 ----------------------------------------------------------- On Oct. 9, 2014, 1:18 a.m., Dominic Hamon wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/26382/ > ----------------------------------------------------------- > > (Updated Oct. 9, 2014, 1:18 a.m.) > > > Review request for mesos and Vinod Kone. > > > Bugs: MESOS-1, MESOS-1830 and MESOS-343 > https://issues.apache.org/jira/browse/MESOS-1 > https://issues.apache.org/jira/browse/MESOS-1830 > https://issues.apache.org/jira/browse/MESOS-343 > > > Repository: mesos-git > > > Description > ------- > > Annotating every TASK_LOST with comments to open discussion. > > If we add a 'source' field and consider adding TASK_INVALID i think it adds > much more information. I don't think the metrics would have to change as the > source matches the source file, I think. Unless I missed a subtlety. Ie, some > of the master TASK_LOST could be set to slave source, but i think it's > debatable. > > > Diffs > ----- > > src/master/master.cpp f05275b00635cee82007ed851bba1cd30a7aa74f > src/sched/sched.cpp a37ed3d2e11035650b9bf0440fb87f66669129d8 > src/scheduler/scheduler.cpp fb88a3e029e97ba33eae5d50503be5ed9c9533e6 > src/slave/slave.cpp e56dcbd80114730949a0d4b553470802a4d38281 > > Diff: https://reviews.apache.org/r/26382/diff/ > > > Testing > ------- > > > Thanks, > > Dominic Hamon > >