I think this (very old) issue is somewhat closely describing the feature:
https://issues.apache.org/jira/browse/FLINK-456



On Thu, Dec 11, 2014 at 8:32 AM, Henry Saputra <henry.sapu...@gmail.com>
wrote:

> Just curious, is there any JIRA filed for this or was it just in
> preliminary proposal talk?
>
> - Henry
>
> On Sun, Dec 7, 2014 at 3:36 PM, Stephan Ewen <se...@apache.org> wrote:
> > That actually sounds like a great idea. I discussed a bit with Robert
> > offline on Friday, and it seems that Metrics has most of what we talked
> > about.
> >
> > I also like the way they make it extensible, so people can capture their
> > own metrics.
> >
> > On Sun, Dec 7, 2014 at 6:02 AM, Henry Saputra <henry.sapu...@gmail.com>
> > wrote:
> >
> >> Hi Robert,
> >>
> >> From I have seen it so far, it is probably better and easier for Flink
> >> to leverage metrics library [1] for the metrics collection rather than
> >> building organically.
> >>
> >> Several ASF projects like Spark [2] and Tajo have used it with great
> >> success.
> >>
> >> One of the main reasons is maintainability and the breath of types of
> >> metric could and should be collected.
> >>
> >> - Henry
> >>
> >> [1] https://dropwizard.github.io/metrics/3.1.0/getting-started/
> >> [2] https://spark.apache.org/docs/1.0.1/monitoring.html
> >> [3] https://issues.apache.org/jira/browse/TAJO-333
> >>
> >> On Sat, Dec 6, 2014 at 11:13 AM, Robert Metzger <rmetz...@apache.org>
> >> wrote:
> >> > Hey Nils,
> >> >
> >> > I have played around a bit with a little prototype. You can find the
> code
> >> > here: https://github.com/rmetzger/incubator-flink/tree/flink456 (its
> >> > another branch in my repo).
> >> > You can see the changes that I applied on top of Till's Akka branch
> here:
> >> >
> >>
> https://github.com/rmetzger/incubator-flink/compare/tillrohrmann:akka_scala...rmetzger:flink456?expand=1
> >> >
> >> > What the code does is collecting statistics about each TaskManager in
> the
> >> > system. These stats are assembled into a "MetricsReport" which is send
> >> with
> >> > the periodical heartbeat to the JobManager. The JobManager stores the
> >> > latest MetricsReport for each TaskManager (in the Instance object for
> >> each
> >> > TM).
> >> > When the user accesses the TaskManager overview, the latest
> MetricsReport
> >> > is send as a JSONObject to the browser.
> >> >
> >> > to test my changes, check out the code, build it
> >> >  mvn clean package -DskipTests -Dcheckstyle.skip=true
> >> > go into
> >> > cd
> >> >
> >>
> flink-dist/target/flink-0.8-incubating-SNAPSHOT-bin/flink-0.8-incubating-SNAPSHOT/
> >> > and start the web interface
> >> > /bin/start-local.sh
> >> >
> >> > Go to localhost:8081, in the "TaskManager" view, you can see some
> >> metrics.
> >> > Here is a screenshot: http://img42.com/eNPve
> >> >
> >> > I named my branch after this issue, as it is probably describing best
> >> what
> >> > we're working on here: FLINK-456
> >> > <https://issues.apache.org/jira/browse/FLINK-456>
> >> >
> >> > As I said in the beginning, its really just a prototype. Let me know
> if
> >> you
> >> > have any further questions.
> >> > For the "per TaskManager" reports, we should probably integrate some
> more
> >> > statistics. Also, the presentation of the numbers is very very basic
> >> right
> >> > now. I think there are many good libraries for visualizing these
> kinds of
> >> > stats.
> >> > Also, the numbers currently represent only a "snapshot", however,
> some of
> >> > the numbers can be accumulated (read/write bytes of the io manager).
> >> > Another missing feature is storing a little history of numbers to
> >> visualize
> >> > metrics over time.
> >> >
> >> > I'm trying to find time to look into "per job" metrics as well. They
> will
> >> > require a bit more infrastructure to distinguish them on the
> JobManager
> >> > side and to get them on the TaskManagers.
> >> >
> >> >
> >> > Best,
> >> > Robert
> >> >
> >> >
> >> >
> >> > On Tue, Dec 2, 2014 at 2:53 PM, aalexandrov <
> >> > alexander.s.alexand...@gmail.com> wrote:
> >> >
> >> >> Hello Nils,
> >> >>
> >> >> I am going to work on a similar issue related to tracking some basics
> >> >> statistics of the intermediate results produced by dataflows during
> >> >> execution.
> >> >>
> >> >> I just create a Jira issue here:
> >> >>
> >> >> https://issues.apache.org/jira/browse/FLINK-1297
> >> >>
> >> >> If you already have some work done on extending the monitoring
> >> capabilities
> >> >> in a branch, it might be good to sync-up the development in order to
> >> avoid
> >> >> duplicated work (e.g. using the same communication channel used to
> send
> >> the
> >> >> data from the task managers to the job manager).
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Enhance-Flink-s-monitoring-capabilities-tp2573p2713.html
> >> >> Sent from the Apache Flink (Incubator) Mailing List archive. mailing
> >> list
> >> >> archive at Nabble.com.
> >> >>
> >>
>

Reply via email to