Hey Nils,

I have played around a bit with a little prototype. You can find the code
here: https://github.com/rmetzger/incubator-flink/tree/flink456 (its
another branch in my repo).
You can see the changes that I applied on top of Till's Akka branch here:
https://github.com/rmetzger/incubator-flink/compare/tillrohrmann:akka_scala...rmetzger:flink456?expand=1

What the code does is collecting statistics about each TaskManager in the
system. These stats are assembled into a "MetricsReport" which is send with
the periodical heartbeat to the JobManager. The JobManager stores the
latest MetricsReport for each TaskManager (in the Instance object for each
TM).
When the user accesses the TaskManager overview, the latest MetricsReport
is send as a JSONObject to the browser.

to test my changes, check out the code, build it
 mvn clean package -DskipTests -Dcheckstyle.skip=true
go into
cd
flink-dist/target/flink-0.8-incubating-SNAPSHOT-bin/flink-0.8-incubating-SNAPSHOT/
and start the web interface
/bin/start-local.sh

Go to localhost:8081, in the "TaskManager" view, you can see some metrics.
Here is a screenshot: http://img42.com/eNPve

I named my branch after this issue, as it is probably describing best what
we're working on here: FLINK-456
<https://issues.apache.org/jira/browse/FLINK-456>

As I said in the beginning, its really just a prototype. Let me know if you
have any further questions.
For the "per TaskManager" reports, we should probably integrate some more
statistics. Also, the presentation of the numbers is very very basic right
now. I think there are many good libraries for visualizing these kinds of
stats.
Also, the numbers currently represent only a "snapshot", however, some of
the numbers can be accumulated (read/write bytes of the io manager).
Another missing feature is storing a little history of numbers to visualize
metrics over time.

I'm trying to find time to look into "per job" metrics as well. They will
require a bit more infrastructure to distinguish them on the JobManager
side and to get them on the TaskManagers.


Best,
Robert



On Tue, Dec 2, 2014 at 2:53 PM, aalexandrov <
alexander.s.alexand...@gmail.com> wrote:

> Hello Nils,
>
> I am going to work on a similar issue related to tracking some basics
> statistics of the intermediate results produced by dataflows during
> execution.
>
> I just create a Jira issue here:
>
> https://issues.apache.org/jira/browse/FLINK-1297
>
> If you already have some work done on extending the monitoring capabilities
> in a branch, it might be good to sync-up the development in order to avoid
> duplicated work (e.g. using the same communication channel used to send the
> data from the task managers to the job manager).
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Enhance-Flink-s-monitoring-capabilities-tp2573p2713.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>

Reply via email to