Hi Devs,

I would like to start a discussion on FLIP-361: Improve GC Metrics [1].

The current Flink GC metrics [2] are not very useful for monitoring
purposes as they require post processing logic that is also dependent on
the current runtime environment.

Problems:
 - Total time is not very relevant for long running applications, only the
rate of change (msPerSec)
 - In most cases it's best to simply aggregate the time/count across the
different GabrageCollectors, however the specific collectors are dependent
on the current Java runtime

We propose to improve the current situation by:
 - Exposing rate metrics per GarbageCollector
 - Exposing aggregated Total time/count/rate metrics

These new metrics are all derived from the existing ones with minimal
overhead.

Looking forward to your feedback.

Cheers,
Gyula

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-361%3A+Improve+GC+Metrics
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#garbagecollection

Reply via email to