Hi Nikolay, Alex,

A couple of my humble comments
> Aggregation should be done with the metric collect system(Prometheus, 
> Graphite, etc.).
I like that statement very much!

> But, what if a user doesn't use any external monitoring system and wants to 
> know the health of Ignite instance?
I think that we can add more capabilities if a real user demand
appears in future. Generally, Ignite is a cluster which almost every
time assumes an external monitoring for a production use.

And a couple of general questions regarding monitoring. If they are
answered in IEP you can simply redirect me there.
1. Are we going to preserve a compatibility with metrics present
before? Or are we going to keep only those making sense today?
2. Can we configure which supported metrics are calculated/exposed? Or
do we calculate/expose everything every time?

пн, 24 июн. 2019 г. в 12:46, Alex Plehanov <plehanov.a...@gmail.com>:
>
> Hi Nikolay,
>
> I think "idle time" is a useful metric, but it can be calculated outside of
> Ignite using external monitoring system.
>
> About execution and waiting time, it's not the right way to calculate it
> using a jobs list. Will jobs list contain only active jobs? In this case,
> you can't calculate these metrics at all, since you don't know the time of
> finished jobs. If the list will contain all jobs (will it be unlimited?),
> iterating over this list will be resource consuming. In any way, it's much
> simpler (and sometimes only possible) for an external monitoring system to
> just get some scalar metric than iterate over a list with some condition.
>
> About aggregation, yes, in an ideal world aggregation should be done with
> the external monitoring system. But, what if a user doesn't use any
> external monitoring system and wants to know the health of Ignite instance?
> Do we have any plans to implement some simple aggregator and ship it with
> Ignite? Do we have plans to provide some presets for Ignite monitoring for
> popular monitoring systems? (These questions not related to this PR, but
> related to IEP at all)
>
> Also, some aggregation metrics ("max" for example) can't be effectively
> calculated using the external system (you should iterate over a jobs list
> again and still precision of such calculation will be no more than the time
> between probes).



-- 
Best regards,
Ivan Pavlukhin

Reply via email to