Re: Monitoring the User Metrics for a long running Spark Job

Chawla,Sumit Tue, 06 Dec 2016 11:41:24 -0800

Any pointers on this?

Regards
Sumit Chawla



On Mon, Dec 5, 2016 at 8:30 PM, Chawla,Sumit <sumitkcha...@gmail.com> wrote:

> An example implementation i found is : https://github.com/groupon/
> spark-metrics
>
> Anyone has any experience using this?  I am more interested in something
> for Pyspark specifically.
>
> The above link pointed to - https://github.com/apache/
> spark/blob/master/conf/metrics.properties.template.  I need to spend some
> time reading it, but any quick pointers will be appreciated.
>
>
>
> Regards
> Sumit Chawla
>
>
> On Mon, Dec 5, 2016 at 8:17 PM, Chawla,Sumit <sumitkcha...@gmail.com>
> wrote:
>
>> Hi Manish
>>
>> I am specifically looking for something similar to following:
>>
>>  https://ci.apache.org/projects/flink/flink-docs-release-1.
>> 1/apis/common/index.html#accumulators--counters.
>>
>> Flink has this concept of Accumulators, where user can keep its custom
>> counters etc.  While the application is executing these counters are
>> queryable through REST API provided by Flink Monitoring Backend.  This way
>> you don't have to wait for the program to complete.
>>
>>
>>
>> Regards
>> Sumit Chawla
>>
>>
>> On Mon, Dec 5, 2016 at 5:53 PM, manish ranjan <cse1.man...@gmail.com>
>> wrote:
>>
>>> http://spark.apache.org/docs/latest/monitoring.html
>>>
>>> You can even install tools like  dstat
>>> <http://dag.wieers.com/home-made/dstat/>, iostat
>>> <http://linux.die.net/man/1/iostat>, and iotop
>>> <http://linux.die.net/man/1/iotop>, *collectd*  can provide
>>> fine-grained profiling on individual nodes.
>>>
>>> If you are using Mesos as Resource Manager , mesos exposes metrics as
>>> well for the running job.
>>>
>>> Manish
>>>
>>> ~Manish
>>>
>>>
>>>
>>> On Mon, Dec 5, 2016 at 4:17 PM, Chawla,Sumit <sumitkcha...@gmail.com>
>>> wrote:
>>>
>>>> Hi All
>>>>
>>>> I have a long running job which takes hours and hours to process data.
>>>> How can i monitor the operational efficency of this job?  I am interested
>>>> in something like Storm\Flink style User metrics/aggregators, which i can
>>>> monitor while my job is running.  Using these metrics i want to monitor,
>>>> per partition performance in processing items.  As of now, only way for me
>>>> to get these metrics is when the job finishes.
>>>>
>>>> One possibility is that spark can flush the metrics to external system
>>>> every few seconds, and thus use  an external system to monitor these
>>>> metrics.  However, i wanted to see if the spark supports any such use case
>>>> OOB.
>>>>
>>>>
>>>> Regards
>>>> Sumit Chawla
>>>>
>>>>
>>>
>>
>

Re: Monitoring the User Metrics for a long running Spark Job

Reply via email to