Hi Igniters! My team and I are building a monitoring system on top of the new metrics framework described in the following IEP: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392 So far it's going well, but we'd like to improve the way metrics are exported from Ignite.
There are different kinds of metrics that you can access through this framework. Some of them are local for a node, like used heap, or CPU load. It makes sense to send them independently from every node to the centralized storage. Let's assume that we attach nodeID to metric names, so that we can distinguish between metrics coming from different nodes. It makes sense to work with local metrics using some kind of patterns on metric names. For example, if I want to draw a chart for CPU load on every node, I can use a pattern similar to the following one: sys.CpuLoad.* There are also the kind of metrics that have the same value, no matter which node the metric is taken from. For example, cache size, progress of rebalance or topology version are global things that don't depend on the node. If I take any of the metrics matching the pattern pme.Duration.*, I will get what I need. I wonder, what is the recommended approach to global metrics? I know that there are tools like Prometheus and Graphite that allow similar manipulations with metric names. Is it supposed that global and local metrics are differentiated on the side of monitoring tools using functions like any(pme.Duration.*) ? It seems that Graphite is lacking one, for example. Maybe it makes sense to introduce a property for metrics that will let the exporters distinguish between them and not parameterize the names with node ID? What do you think? Denis