Hey Nils, I have played around a bit with a little prototype. You can find the code here: https://github.com/rmetzger/incubator-flink/tree/flink456 (its another branch in my repo). You can see the changes that I applied on top of Till's Akka branch here: https://github.com/rmetzger/incubator-flink/compare/tillrohrmann:akka_scala...rmetzger:flink456?expand=1
What the code does is collecting statistics about each TaskManager in the system. These stats are assembled into a "MetricsReport" which is send with the periodical heartbeat to the JobManager. The JobManager stores the latest MetricsReport for each TaskManager (in the Instance object for each TM). When the user accesses the TaskManager overview, the latest MetricsReport is send as a JSONObject to the browser. to test my changes, check out the code, build it mvn clean package -DskipTests -Dcheckstyle.skip=true go into cd flink-dist/target/flink-0.8-incubating-SNAPSHOT-bin/flink-0.8-incubating-SNAPSHOT/ and start the web interface /bin/start-local.sh Go to localhost:8081, in the "TaskManager" view, you can see some metrics. Here is a screenshot: http://img42.com/eNPve I named my branch after this issue, as it is probably describing best what we're working on here: FLINK-456 <https://issues.apache.org/jira/browse/FLINK-456> As I said in the beginning, its really just a prototype. Let me know if you have any further questions. For the "per TaskManager" reports, we should probably integrate some more statistics. Also, the presentation of the numbers is very very basic right now. I think there are many good libraries for visualizing these kinds of stats. Also, the numbers currently represent only a "snapshot", however, some of the numbers can be accumulated (read/write bytes of the io manager). Another missing feature is storing a little history of numbers to visualize metrics over time. I'm trying to find time to look into "per job" metrics as well. They will require a bit more infrastructure to distinguish them on the JobManager side and to get them on the TaskManagers. Best, Robert On Tue, Dec 2, 2014 at 2:53 PM, aalexandrov < alexander.s.alexand...@gmail.com> wrote: > Hello Nils, > > I am going to work on a similar issue related to tracking some basics > statistics of the intermediate results produced by dataflows during > execution. > > I just create a Jira issue here: > > https://issues.apache.org/jira/browse/FLINK-1297 > > If you already have some work done on extending the monitoring capabilities > in a branch, it might be good to sync-up the development in order to avoid > duplicated work (e.g. using the same communication channel used to send the > data from the task managers to the job manager). > > > > -- > View this message in context: > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Enhance-Flink-s-monitoring-capabilities-tp2573p2713.html > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list > archive at Nabble.com. >