An example implementation i found is : https://github.com/groupon/spark-metrics
Anyone has any experience using this? I am more interested in something for Pyspark specifically. The above link pointed to - https://github.com/apache/spark/blob/master/conf/metrics.properties.template. I need to spend some time reading it, but any quick pointers will be appreciated. Regards Sumit Chawla On Mon, Dec 5, 2016 at 8:17 PM, Chawla,Sumit <sumitkcha...@gmail.com> wrote: > Hi Manish > > I am specifically looking for something similar to following: > > https://ci.apache.org/projects/flink/flink-docs- > release-1.1/apis/common/index.html#accumulators--counters. > > Flink has this concept of Accumulators, where user can keep its custom > counters etc. While the application is executing these counters are > queryable through REST API provided by Flink Monitoring Backend. This way > you don't have to wait for the program to complete. > > > > Regards > Sumit Chawla > > > On Mon, Dec 5, 2016 at 5:53 PM, manish ranjan <cse1.man...@gmail.com> > wrote: > >> http://spark.apache.org/docs/latest/monitoring.html >> >> You can even install tools like dstat >> <http://dag.wieers.com/home-made/dstat/>, iostat >> <http://linux.die.net/man/1/iostat>, and iotop >> <http://linux.die.net/man/1/iotop>, *collectd* can provide fine-grained >> profiling on individual nodes. >> >> If you are using Mesos as Resource Manager , mesos exposes metrics as >> well for the running job. >> >> Manish >> >> ~Manish >> >> >> >> On Mon, Dec 5, 2016 at 4:17 PM, Chawla,Sumit <sumitkcha...@gmail.com> >> wrote: >> >>> Hi All >>> >>> I have a long running job which takes hours and hours to process data. >>> How can i monitor the operational efficency of this job? I am interested >>> in something like Storm\Flink style User metrics/aggregators, which i can >>> monitor while my job is running. Using these metrics i want to monitor, >>> per partition performance in processing items. As of now, only way for me >>> to get these metrics is when the job finishes. >>> >>> One possibility is that spark can flush the metrics to external system >>> every few seconds, and thus use an external system to monitor these >>> metrics. However, i wanted to see if the spark supports any such use case >>> OOB. >>> >>> >>> Regards >>> Sumit Chawla >>> >>> >> >