Hello, Alex.

Based on our private discussion I've additionally migrated `totalExecutionTime` 
and `totalWaitingTime` counters.
Can you review the PR [1]?

[1] https://github.com/apache/ignite/pull/6622

В Пн, 24/06/2019 в 15:14 +0300, Nikolay Izhikov пишет:
> Hello, Alex.
> 
> Thanks for the answer.
> 
> 1. I, actually, don't understand your proposal :)
> Can you write it down? 
> What numbers should be additionally migrated in this PR? 
> Or it's OK for now?
> 
> > I think "idle time" is a useful metric
> 
> I think "usefulness" or "uselessness" of specific metrics depends on the 
> questions we can answer with it.
> What questions we can ask about Ignite instance and answer with "idle time" 
> metric?
> 
> > About execution and waiting time , it's not the right way to calculate it
> > using a jobs list. 
> 
> Same question here.
> 
> What questions we can answer with current numbers?
> 
> > Will jobs list contain only active jobs?
> 
> All jobs that are scheduled for execution on the node(active + waiting) 
> should be in the list.
> I try to put more details here, to expose my way of thinking about metrics 
> and lists:
> 
> If you have some issues with the jobs on the node it can be 2 kinds of 
> issues: 
>       1. You are waiting for the results of some job and want to know why it 
> doesn't execute.
> 
>               In this case, you should query "jobs list" from Ignite.
>               You can get an answer on:
>                       * What jobs currently executes?
>                       * How many time your job waiting to be executed?
> 
>               You can also check "activeJobs", "waitingJobs" metrics graphics 
> to know changes in the jobs queue during the time.
>               Seems, you can predict the start of your job from these 
> numbers.                
> 
>       2. You want to understand the lifecycle of some finished(failed job).
> 
>               In this case, you should analyze the log of the node.
>               It should contain information about time:
>                       * node recieve job information
>                       * job added to the queue
>                       * job started execution
>                       * job finished(failed) execution.
> 
> I don't see questions we can't ask from these sources.
> Do we have such?
> How numbers from current GridJobMetrics can help with these questions?
> 
> 
> > But, what if a user doesn't use any
> > external monitoring system and wants to know the health of Ignite instance?
> 
> It depends on how we define "health".
> And it's not trivial question :)
> 
> > Do we have any plans to implement some simple aggregator and ship it with 
> > Ignite?
> 
> I think NO.
> We shouldn't do it.
> 
> > Do we have plans to provide some presets for Ignite monitoring for
> > popular monitoring systems?
> 
> I think we shouldn't do it.
> Because monitoring presets heavily depends on the usage scenario.
> And it can heavily vary for the Ignite.
> 
> 
> В Пн, 24/06/2019 в 12:46 +0300, Alex Plehanov пишет:
> > Hi Nikolay,
> > 
> > I think "idle time" is a useful metric, but it can be calculated outside of
> > Ignite using external monitoring system.
> > 
> > About execution and waiting time, it's not the right way to calculate it
> > using a jobs list. Will jobs list contain only active jobs? In this case,
> > you can't calculate these metrics at all, since you don't know the time of
> > finished jobs. If the list will contain all jobs (will it be unlimited?),
> > iterating over this list will be resource consuming. In any way, it's much
> > simpler (and sometimes only possible) for an external monitoring system to
> > just get some scalar metric than iterate over a list with some condition.
> > 
> > About aggregation, yes, in an ideal world aggregation should be done with
> > the external monitoring system. But, what if a user doesn't use any
> > external monitoring system and wants to know the health of Ignite instance?
> > Do we have any plans to implement some simple aggregator and ship it with
> > Ignite? Do we have plans to provide some presets for Ignite monitoring for
> > popular monitoring systems? (These questions not related to this PR, but
> > related to IEP at all)
> > 
> > Also, some aggregation metrics ("max" for example) can't be effectively
> > calculated using the external system (you should iterate over a jobs list
> > again and still precision of such calculation will be no more than the time
> > between probes).

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to