I think this (very old) issue is somewhat closely describing the feature: https://issues.apache.org/jira/browse/FLINK-456
On Thu, Dec 11, 2014 at 8:32 AM, Henry Saputra <henry.sapu...@gmail.com> wrote: > Just curious, is there any JIRA filed for this or was it just in > preliminary proposal talk? > > - Henry > > On Sun, Dec 7, 2014 at 3:36 PM, Stephan Ewen <se...@apache.org> wrote: > > That actually sounds like a great idea. I discussed a bit with Robert > > offline on Friday, and it seems that Metrics has most of what we talked > > about. > > > > I also like the way they make it extensible, so people can capture their > > own metrics. > > > > On Sun, Dec 7, 2014 at 6:02 AM, Henry Saputra <henry.sapu...@gmail.com> > > wrote: > > > >> Hi Robert, > >> > >> From I have seen it so far, it is probably better and easier for Flink > >> to leverage metrics library [1] for the metrics collection rather than > >> building organically. > >> > >> Several ASF projects like Spark [2] and Tajo have used it with great > >> success. > >> > >> One of the main reasons is maintainability and the breath of types of > >> metric could and should be collected. > >> > >> - Henry > >> > >> [1] https://dropwizard.github.io/metrics/3.1.0/getting-started/ > >> [2] https://spark.apache.org/docs/1.0.1/monitoring.html > >> [3] https://issues.apache.org/jira/browse/TAJO-333 > >> > >> On Sat, Dec 6, 2014 at 11:13 AM, Robert Metzger <rmetz...@apache.org> > >> wrote: > >> > Hey Nils, > >> > > >> > I have played around a bit with a little prototype. You can find the > code > >> > here: https://github.com/rmetzger/incubator-flink/tree/flink456 (its > >> > another branch in my repo). > >> > You can see the changes that I applied on top of Till's Akka branch > here: > >> > > >> > https://github.com/rmetzger/incubator-flink/compare/tillrohrmann:akka_scala...rmetzger:flink456?expand=1 > >> > > >> > What the code does is collecting statistics about each TaskManager in > the > >> > system. These stats are assembled into a "MetricsReport" which is send > >> with > >> > the periodical heartbeat to the JobManager. The JobManager stores the > >> > latest MetricsReport for each TaskManager (in the Instance object for > >> each > >> > TM). > >> > When the user accesses the TaskManager overview, the latest > MetricsReport > >> > is send as a JSONObject to the browser. > >> > > >> > to test my changes, check out the code, build it > >> > mvn clean package -DskipTests -Dcheckstyle.skip=true > >> > go into > >> > cd > >> > > >> > flink-dist/target/flink-0.8-incubating-SNAPSHOT-bin/flink-0.8-incubating-SNAPSHOT/ > >> > and start the web interface > >> > /bin/start-local.sh > >> > > >> > Go to localhost:8081, in the "TaskManager" view, you can see some > >> metrics. > >> > Here is a screenshot: http://img42.com/eNPve > >> > > >> > I named my branch after this issue, as it is probably describing best > >> what > >> > we're working on here: FLINK-456 > >> > <https://issues.apache.org/jira/browse/FLINK-456> > >> > > >> > As I said in the beginning, its really just a prototype. Let me know > if > >> you > >> > have any further questions. > >> > For the "per TaskManager" reports, we should probably integrate some > more > >> > statistics. Also, the presentation of the numbers is very very basic > >> right > >> > now. I think there are many good libraries for visualizing these > kinds of > >> > stats. > >> > Also, the numbers currently represent only a "snapshot", however, > some of > >> > the numbers can be accumulated (read/write bytes of the io manager). > >> > Another missing feature is storing a little history of numbers to > >> visualize > >> > metrics over time. > >> > > >> > I'm trying to find time to look into "per job" metrics as well. They > will > >> > require a bit more infrastructure to distinguish them on the > JobManager > >> > side and to get them on the TaskManagers. > >> > > >> > > >> > Best, > >> > Robert > >> > > >> > > >> > > >> > On Tue, Dec 2, 2014 at 2:53 PM, aalexandrov < > >> > alexander.s.alexand...@gmail.com> wrote: > >> > > >> >> Hello Nils, > >> >> > >> >> I am going to work on a similar issue related to tracking some basics > >> >> statistics of the intermediate results produced by dataflows during > >> >> execution. > >> >> > >> >> I just create a Jira issue here: > >> >> > >> >> https://issues.apache.org/jira/browse/FLINK-1297 > >> >> > >> >> If you already have some work done on extending the monitoring > >> capabilities > >> >> in a branch, it might be good to sync-up the development in order to > >> avoid > >> >> duplicated work (e.g. using the same communication channel used to > send > >> the > >> >> data from the task managers to the job manager). > >> >> > >> >> > >> >> > >> >> -- > >> >> View this message in context: > >> >> > >> > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Enhance-Flink-s-monitoring-capabilities-tp2573p2713.html > >> >> Sent from the Apache Flink (Incubator) Mailing List archive. mailing > >> list > >> >> archive at Nabble.com. > >> >> > >> >