Re: [IEP-35] GridJobProcessorMetrics migration

Nikolay Izhikov Mon, 24 Jun 2019 05:16:04 -0700

Hello, Ivan.

> Ignite is a cluster which almost every
> time assumes an external monitoring for a production use.


+1.

> 1. Are we going to preserve a compatibility with metrics present
> before? Or are we going to keep only those making sense today?

1. Backward compatibility preserved.
2. Deprecated metrics(and metric APIs) will be removed in Ignite 3.
3. We should make a decision what numbers are "make sense" and what don't.

> 2. Can we configure which supported metrics are calculated/exposed? Or
> do we calculate/expose everything every time?

1. You can configure filter for the exposed metrics. Only required subset of 
the metric will be exported.
2. For now, all metrics(not lists!) will be calculated. Please, note, that 
every metrics is the simple long(double) counter.

В Пн, 24/06/2019 в 14:43 +0300, Павлухин Иван пишет:
> Hi Nikolay, Alex,
> 
> A couple of my humble comments
> > Aggregation should be done with the metric collect system(Prometheus, 
> > Graphite, etc.).
> 
> I like that statement very much!
> 
> > But, what if a user doesn't use any external monitoring system and wants to 
> > know the health of Ignite instance?
> 
> I think that we can add more capabilities if a real user demand
> appears in future. Generally, Ignite is a cluster which almost every
> time assumes an external monitoring for a production use.
> 
> And a couple of general questions regarding monitoring. If they are
> answered in IEP you can simply redirect me there.
> 1. Are we going to preserve a compatibility with metrics present
> before? Or are we going to keep only those making sense today?
> 2. Can we configure which supported metrics are calculated/exposed? Or
> do we calculate/expose everything every time?
> 
> пн, 24 июн. 2019 г. в 12:46, Alex Plehanov <plehanov.a...@gmail.com>:
> > 
> > Hi Nikolay,
> > 
> > I think "idle time" is a useful metric, but it can be calculated outside of
> > Ignite using external monitoring system.
> > 
> > About execution and waiting time, it's not the right way to calculate it
> > using a jobs list. Will jobs list contain only active jobs? In this case,
> > you can't calculate these metrics at all, since you don't know the time of
> > finished jobs. If the list will contain all jobs (will it be unlimited?),
> > iterating over this list will be resource consuming. In any way, it's much
> > simpler (and sometimes only possible) for an external monitoring system to
> > just get some scalar metric than iterate over a list with some condition.
> > 
> > About aggregation, yes, in an ideal world aggregation should be done with
> > the external monitoring system. But, what if a user doesn't use any
> > external monitoring system and wants to know the health of Ignite instance?
> > Do we have any plans to implement some simple aggregator and ship it with
> > Ignite? Do we have plans to provide some presets for Ignite monitoring for
> > popular monitoring systems? (These questions not related to this PR, but
> > related to IEP at all)
> > 
> > Also, some aggregation metrics ("max" for example) can't be effectively
> > calculated using the external system (you should iterate over a jobs list
> > again and still precision of such calculation will be no more than the time
> > between probes).
> 
> 
>

signature.asc
Description: This is a digitally signed message part

Re: [IEP-35] GridJobProcessorMetrics migration

Reply via email to