Hi Robert, >From I have seen it so far, it is probably better and easier for Flink to leverage metrics library [1] for the metrics collection rather than building organically.
Several ASF projects like Spark [2] and Tajo have used it with great success. One of the main reasons is maintainability and the breath of types of metric could and should be collected. - Henry [1] https://dropwizard.github.io/metrics/3.1.0/getting-started/ [2] https://spark.apache.org/docs/1.0.1/monitoring.html [3] https://issues.apache.org/jira/browse/TAJO-333 On Sat, Dec 6, 2014 at 11:13 AM, Robert Metzger <rmetz...@apache.org> wrote: > Hey Nils, > > I have played around a bit with a little prototype. You can find the code > here: https://github.com/rmetzger/incubator-flink/tree/flink456 (its > another branch in my repo). > You can see the changes that I applied on top of Till's Akka branch here: > https://github.com/rmetzger/incubator-flink/compare/tillrohrmann:akka_scala...rmetzger:flink456?expand=1 > > What the code does is collecting statistics about each TaskManager in the > system. These stats are assembled into a "MetricsReport" which is send with > the periodical heartbeat to the JobManager. The JobManager stores the > latest MetricsReport for each TaskManager (in the Instance object for each > TM). > When the user accesses the TaskManager overview, the latest MetricsReport > is send as a JSONObject to the browser. > > to test my changes, check out the code, build it > mvn clean package -DskipTests -Dcheckstyle.skip=true > go into > cd > flink-dist/target/flink-0.8-incubating-SNAPSHOT-bin/flink-0.8-incubating-SNAPSHOT/ > and start the web interface > /bin/start-local.sh > > Go to localhost:8081, in the "TaskManager" view, you can see some metrics. > Here is a screenshot: http://img42.com/eNPve > > I named my branch after this issue, as it is probably describing best what > we're working on here: FLINK-456 > <https://issues.apache.org/jira/browse/FLINK-456> > > As I said in the beginning, its really just a prototype. Let me know if you > have any further questions. > For the "per TaskManager" reports, we should probably integrate some more > statistics. Also, the presentation of the numbers is very very basic right > now. I think there are many good libraries for visualizing these kinds of > stats. > Also, the numbers currently represent only a "snapshot", however, some of > the numbers can be accumulated (read/write bytes of the io manager). > Another missing feature is storing a little history of numbers to visualize > metrics over time. > > I'm trying to find time to look into "per job" metrics as well. They will > require a bit more infrastructure to distinguish them on the JobManager > side and to get them on the TaskManagers. > > > Best, > Robert > > > > On Tue, Dec 2, 2014 at 2:53 PM, aalexandrov < > alexander.s.alexand...@gmail.com> wrote: > >> Hello Nils, >> >> I am going to work on a similar issue related to tracking some basics >> statistics of the intermediate results produced by dataflows during >> execution. >> >> I just create a Jira issue here: >> >> https://issues.apache.org/jira/browse/FLINK-1297 >> >> If you already have some work done on extending the monitoring capabilities >> in a branch, it might be good to sync-up the development in order to avoid >> duplicated work (e.g. using the same communication channel used to send the >> data from the task managers to the job manager). >> >> >> >> -- >> View this message in context: >> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Enhance-Flink-s-monitoring-capabilities-tp2573p2713.html >> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list >> archive at Nabble.com. >>