Re: Review Request 27531: Update Master and Slave metrics to match task source and reason scheme.

Dominic Hamon Tue, 18 Nov 2014 15:13:55 -0800


> On Nov. 3, 2014, 1:59 p.m., Vinod Kone wrote:
> > src/master/master.cpp, lines 5186-5230
> > <https://reviews.apache.org/r/27531/diff/1/?file=747568#file747568line5186>
> >
> >     Hmm. This is rather unweildy.
> >     
> >     Exposing the cross product of status and source and reason in the 
> > metrics is a bit strange, considering most reasons are only related to 
> > TASK_LOST. IOW, most of those metrics will be 0s which I find weird.
> >     
> >     AFAICT, this is what we have:
> >     
> >     TASK_STAGING, TASK_STARTING, TASK_RUNNING and TASK_FINISHED are sent by 
> > the master (reconciliation) or executor. Either way, there is no reason 
> > associated with them.
> >     
> >     TASK_FAILED is generated by master (reconciliation) or slave (oom or 
> > command executor failed) or executor. I'll comment on the slave aspect in 
> > the dependent review, because i realized it sets incorrect reason.
> >     
> >     TASK_KILLED can be generated by master (reconciliation, pending), slave 
> > (pending, framework removed) or executor.
> >     
> >     TASK_LOST can be generated by master or slave or executor and can 
> > contain any of the reasons.
> >      
> >     
> >     Given the above, I would rather we use explicit combinations of status, 
> > source and reason metrics to capture these semantics, rather than using a 
> > vector of vector of vector.
> 
> Dominic Hamon wrote:
>     All task status can have reason 'reconciliation' or 'None', so there 
> needs to be a reason for everything.
>     
>     I'll code up the explicit combination, but it will likely be as unwieldy 
> in terms of tracking which are valid combinations. It also means that changes 
> to sources/reasons will require API changes whereas this covers every 
> possible future combination and is a complete API.
>     
>     I'm starting to think I should pull the metrics out into a separate file 
> (as per a TODO) to avoid churn. Maybe as a dependent review. What do you 
> think?
> 
> Vinod Kone wrote:
>     ```
>     It also means that changes to sources/reasons will require API changes 
>     ```
>     I don't think I follow. Why?
>     
>     Another thing to consider is to reduce the scope of this work for 0.21.0. 
> Maybe just include reasons for TASK_LOST and nothing else? Would that make it 
> less unweildy for now?
> 
> Dominic Hamon wrote:
>     if we define the explicit set of task state/source/reason tuples and a 
> reason changes for a given state/source, we'd have to deprecate the old 
> combination to introduce the new one. concrete example:
>     
>     TASK_FAILED/SOURCE_SLAVE/REASON_EXECUTOR_TERMINATED will become
>     TASK_FAILED/SOURCE_SLAVE/REASON_OOM and 
> TASK_FAILED/SOURCE_SLAVE/REASON_INVALID_COMMAND in a future patch
>     
>     we'd then need to deprecate the REASON_EXECUTOR_TERMINATED metric through 
> a cycle as it would then be unused.
>     
>     Actually, thinking about it more, we'd have to do that anyway and the 
> explicit list of metrics makes it easier!
>     
>     let me code up the explicit combinations and see what it looks like.


i was thinking about this too hard. making it simpler.. incoming patch.

sorry for the pointless review.


- Dominic


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27531/#review59613
-----------------------------------------------------------


On Nov. 4, 2014, 8:58 a.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27531/
> -----------------------------------------------------------
> 
> (Updated Nov. 4, 2014, 8:58 a.m.)
> 
> 
> Review request for mesos, Tobias Weingartner and Vinod Kone.
> 
> 
> Bugs: MESOS-1830
>     https://issues.apache.org/jira/browse/MESOS-1830
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Update metrics in Master to match the source and reason split for task 
> statuses.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 687f1789712dcd867b411badd85f4a12ae8f16d7 
>   src/master/master.cpp d914786a2517677ee7dd4a3130b4b191ef416c26 
>   src/slave/slave.hpp 6c183f827ec2521052f8b088d4583f13e661db2c 
>   src/slave/slave.cpp b8935173bae9c124b8d08db590893334d7a45a23 
>   src/tests/master_tests.cpp 2e525749247626c05efb2f54a707599facb114b6 
>   src/tests/slave_tests.cpp d2cbaf82391411bdc5d85327478d7ca8072048af 
> 
> Diff: https://reviews.apache.org/r/27531/diff/
> 
> 
> Testing
> -------
> 
> make check
> run master and check endpoint:
> 
> {
> ...
> master/reconciliation_states/task_error: 0,
> master/reconciliation_states/task_failed: 0,
> master/reconciliation_states/task_finished: 0,
> master/reconciliation_states/task_killed: 0,
> master/reconciliation_states/task_lost: 0,
> master/reconciliation_states/task_running: 0,
> master/reconciliation_states/task_staging: 0,
> master/reconciliation_states/task_starting: 0,
> master/recovery_slave_removals: 0,
> master/slave_registrations: 1,
> master/slave_removals: 0,
> master/slave_reregistrations: 0,
> master/slaves_active: 1,
> master/slaves_connected: 1,
> master/slaves_disconnected: 0,
> master/slaves_inactive: 0,
> master/tasks_failed: 0,
> master/tasks_finished: 0,
> master/tasks_killed: 0,
> master/tasks_lost: 0,
> master/tasks_lost/reason_executor_terminated: 0,
> master/tasks_lost/reason_executor_unregistered: 0,
> master/tasks_lost/reason_framework_removed: 0,
> master/tasks_lost/reason_gc_error: 0,
> master/tasks_lost/reason_invalid_command: 0,
> master/tasks_lost/reason_invalid_frameworkid: 0,
> master/tasks_lost/reason_invalid_offers: 0,
> master/tasks_lost/reason_master_disconnected: 0,
> master/tasks_lost/reason_memory_limit: 0,
> master/tasks_lost/reason_reconciliation: 0,
> master/tasks_lost/reason_slave_disconnected: 0,
> master/tasks_lost/reason_slave_removed: 0,
> master/tasks_lost/reason_slave_restarted: 0,
> master/tasks_lost/reason_slave_unknown: 0,
> master/tasks_lost/reason_task_invalid: 0,
> master/tasks_lost/reason_task_unauthorized: 0,
> master/tasks_lost/reason_task_unknown: 0,
> ...
> }
> 
> and slave:
> 
> {
> ...
> slave/tasks_lost: 0,
> slave/tasks_lost/reason_executor_terminated: 0,
> slave/tasks_lost/reason_executor_unregistered: 0,
> slave/tasks_lost/reason_framework_removed: 0,
> slave/tasks_lost/reason_gc_error: 0,
> slave/tasks_lost/reason_invalid_command: 0,
> slave/tasks_lost/reason_invalid_frameworkid: 0,
> slave/tasks_lost/reason_invalid_offers: 0,
> slave/tasks_lost/reason_master_disconnected: 0,
> slave/tasks_lost/reason_memory_limit: 0,
> slave/tasks_lost/reason_reconciliation: 0,
> slave/tasks_lost/reason_slave_disconnected: 0,
> slave/tasks_lost/reason_slave_removed: 0,
> slave/tasks_lost/reason_slave_restarted: 0,
> slave/tasks_lost/reason_slave_unknown: 0,
> slave/tasks_lost/reason_task_invalid: 0,
> slave/tasks_lost/reason_task_unauthorized: 0,
> slave/tasks_lost/reason_task_unknown: 0,
> slave/tasks_running: 0,
> slave/tasks_staging: 0,
> slave/tasks_starting: 0,
> ...
> }
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>

Re: Review Request 27531: Update Master and Slave metrics to match task source and reason scheme.

Reply via email to