Hi all,

I just wanted to introduce some of my recent work in IBM Research around
Spark and especially its Metric System and Web UI.
As a quick overview of our contributions:
We have a created a new type of Sink for the metrics ( HDFSSink ) which
captures the metrics into HDFS,
We have extended the metrics reported by the Executors to include OS-level
metrics regarding CPU, RAM, Disk IO, Network IO utilizing the Hyperic Sigar
library
We have extended the Web UI for the completed applications to visualize any
of the above metrics the user wants to.
The above functionalities can be configured in the metrics.properties and
spark-defaults.conf files.
We have recorded a small demo that shows those capabilities which you can
find here : https://ibm.app.box.com/s/vyaedlyb444a4zna1215c7puhxliqxdg
There is a blog post which gives more details on the functionality here:
*www.spark.tc/sparkoscope-enabling-spark-optimization-through-cross-stack-monitoring-and-visualization-2/*
<http://www.spark.tc/sparkoscope-enabling-spark-optimization-through-cross-stack-monitoring-and-visualization-2/>
and also there is a public repo where anyone can try it:
*https://github.com/ibm-research-ireland/sparkoscope*
<https://github.com/ibm-research-ireland/sparkoscope>

I would really appreciate any feedback or advice regarding this work.
Especially if you think it's worth it to upstream to the official Spark
repository.

Thanks a lot!

Reply via email to