Hello together,

I am trying to enhance Flink's monitoring capabilities in style of the GSoC
2014 Proposal by Rajika Kumarasiri [1].

Short abstract:
He suggested to use the Java standard, the Java Mangement Extensions(JMX).
The idea is to put an MBean-Server in the JobManager, so that the
JobManager itself and all Taskmanagers in the cluster can register their
MBeans to this server via RMI.
Different monitoring stages (No, standard, full) reduce the affect on the
system performance.
The JMX service should be accessible in an improved web-component using an
RESTful API.
He also suggested the use of the SIGAR[2] JNI library to gather the system
information.
In my opinion this point is discussible. In Java 7 they introduced Platform
MXBeans[3] which already cover the basic system information, and so in my
eyes the use of a JNI library might be a little overkill. But of course
this depends on the aimed depth of monitoring.

So the primary question:
What parameters/system properties/utilizations/work loads should be
monitored in your opinions?

Have a nice weekend!
Nils

[1]
https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Rajika-Kumarasiri
[2] https://support.hyperic.com/display/SIGAR/Home
[3]
https://docs.oracle.com/javase/7/docs/technotes/guides/management/overview.html

Reply via email to