Hi Nikolay, Alex, A couple of my humble comments > Aggregation should be done with the metric collect system(Prometheus, > Graphite, etc.). I like that statement very much!
> But, what if a user doesn't use any external monitoring system and wants to > know the health of Ignite instance? I think that we can add more capabilities if a real user demand appears in future. Generally, Ignite is a cluster which almost every time assumes an external monitoring for a production use. And a couple of general questions regarding monitoring. If they are answered in IEP you can simply redirect me there. 1. Are we going to preserve a compatibility with metrics present before? Or are we going to keep only those making sense today? 2. Can we configure which supported metrics are calculated/exposed? Or do we calculate/expose everything every time? пн, 24 июн. 2019 г. в 12:46, Alex Plehanov <plehanov.a...@gmail.com>: > > Hi Nikolay, > > I think "idle time" is a useful metric, but it can be calculated outside of > Ignite using external monitoring system. > > About execution and waiting time, it's not the right way to calculate it > using a jobs list. Will jobs list contain only active jobs? In this case, > you can't calculate these metrics at all, since you don't know the time of > finished jobs. If the list will contain all jobs (will it be unlimited?), > iterating over this list will be resource consuming. In any way, it's much > simpler (and sometimes only possible) for an external monitoring system to > just get some scalar metric than iterate over a list with some condition. > > About aggregation, yes, in an ideal world aggregation should be done with > the external monitoring system. But, what if a user doesn't use any > external monitoring system and wants to know the health of Ignite instance? > Do we have any plans to implement some simple aggregator and ship it with > Ignite? Do we have plans to provide some presets for Ignite monitoring for > popular monitoring systems? (These questions not related to this PR, but > related to IEP at all) > > Also, some aggregation metrics ("max" for example) can't be effectively > calculated using the external system (you should iterate over a jobs list > again and still precision of such calculation will be no more than the time > between probes). -- Best regards, Ivan Pavlukhin