Re: Monitoring the User Metrics for a long running Spark Job

Chawla,Sumit Mon, 05 Dec 2016 20:31:43 -0800

An example implementation i found is :
https://github.com/groupon/spark-metrics


Anyone has any experience using this?  I am more interested in something
for Pyspark specifically.

The above link pointed to -
https://github.com/apache/spark/blob/master/conf/metrics.properties.template.
I need to spend some time reading it, but any quick pointers will be
appreciated.



Regards
Sumit Chawla


On Mon, Dec 5, 2016 at 8:17 PM, Chawla,Sumit <sumitkcha...@gmail.com> wrote:

> Hi Manish
>
> I am specifically looking for something similar to following:
>
>  https://ci.apache.org/projects/flink/flink-docs-
> release-1.1/apis/common/index.html#accumulators--counters.
>
> Flink has this concept of Accumulators, where user can keep its custom
> counters etc.  While the application is executing these counters are
> queryable through REST API provided by Flink Monitoring Backend.  This way
> you don't have to wait for the program to complete.
>
>
>
> Regards
> Sumit Chawla
>
>
> On Mon, Dec 5, 2016 at 5:53 PM, manish ranjan <cse1.man...@gmail.com>
> wrote:
>
>> http://spark.apache.org/docs/latest/monitoring.html
>>
>> You can even install tools like  dstat
>> <http://dag.wieers.com/home-made/dstat/>, iostat
>> <http://linux.die.net/man/1/iostat>, and iotop
>> <http://linux.die.net/man/1/iotop>, *collectd*  can provide fine-grained
>> profiling on individual nodes.
>>
>> If you are using Mesos as Resource Manager , mesos exposes metrics as
>> well for the running job.
>>
>> Manish
>>
>> ~Manish
>>
>>
>>
>> On Mon, Dec 5, 2016 at 4:17 PM, Chawla,Sumit <sumitkcha...@gmail.com>
>> wrote:
>>
>>> Hi All
>>>
>>> I have a long running job which takes hours and hours to process data.
>>> How can i monitor the operational efficency of this job?  I am interested
>>> in something like Storm\Flink style User metrics/aggregators, which i can
>>> monitor while my job is running.  Using these metrics i want to monitor,
>>> per partition performance in processing items.  As of now, only way for me
>>> to get these metrics is when the job finishes.
>>>
>>> One possibility is that spark can flush the metrics to external system
>>> every few seconds, and thus use  an external system to monitor these
>>> metrics.  However, i wanted to see if the spark supports any such use case
>>> OOB.
>>>
>>>
>>> Regards
>>> Sumit Chawla
>>>
>>>
>>
>

Re: Monitoring the User Metrics for a long running Spark Job

Reply via email to