Re: Spark Profiler

2019-03-29 Thread jcdauchy
Hello Jack,

You can also have a look at “Babar”, there is a nice “flame graph” feature
too. I haven’t had the time to test it out.

https://github.com/criteo/babar

JC




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark Profiler

2019-03-29 Thread Hariharan
Hi Jack,

You can try sparklens (https://github.com/qubole/sparklens). I think it
won't give details at as low a level as you're looking for, but it can help
you identify and remove performance bottlenecks.

~ Hariharan

On Fri, Mar 29, 2019 at 12:01 AM bo yang  wrote:

> Yeah, these options are very valuable. Just add another option :) We build
> a jvm profiler (https://github.com/uber-common/jvm-profiler) to monitor
> and profile Spark applications in large scale (e.g. sending metrics to
> kafka / hive for batch analysis). People could try it as well.
>
>
> On Wed, Mar 27, 2019 at 1:49 PM Jack Kolokasis 
> wrote:
>
>> Thanks for your reply.  Your help is very valuable and all these links
>> are helpful (especially your example)
>>
>> Best Regards
>>
>> --Iacovos
>> On 3/27/19 10:42 PM, Luca Canali wrote:
>>
>> I find that the Spark metrics system is quite useful to gather resource
>> utilization metrics of Spark applications, including CPU, memory and I/O.
>>
>> If you are interested an example how this works for us at:
>> https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark
>> If instead you are rather looking at ways to instrument your Spark code
>> with performance metrics, Spark task metrics and event listeners are quite
>> useful for that. See also
>> https://github.com/apache/spark/blob/master/docs/monitoring.md and
>> https://github.com/LucaCanali/sparkMeasure
>>
>>
>>
>> Regards,
>>
>> Luca
>>
>>
>>
>> *From:* manish ranjan  
>> *Sent:* Tuesday, March 26, 2019 15:24
>> *To:* Jack Kolokasis  
>> *Cc:* user  
>> *Subject:* Re: Spark Profiler
>>
>>
>>
>> I have found ganglia very helpful in understanding network I/o , CPU and
>> memory usage  for a given spark cluster.
>>
>> I have not used , but have heard good things about Dr Elephant ( which I
>> think was contributed by LinkedIn but not 100%sure).
>>
>>
>>
>> On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis 
>> wrote:
>>
>> Hello all,
>>
>>  I am looking for a spark profiler to trace my application to find
>> the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage.
>>
>> I am looking forward for your reply.
>>
>> --Iacovos
>>
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: Spark Profiler

2019-03-28 Thread bo yang
Yeah, these options are very valuable. Just add another option :) We build
a jvm profiler (https://github.com/uber-common/jvm-profiler) to monitor and
profile Spark applications in large scale (e.g. sending metrics to kafka /
hive for batch analysis). People could try it as well.


On Wed, Mar 27, 2019 at 1:49 PM Jack Kolokasis 
wrote:

> Thanks for your reply.  Your help is very valuable and all these links are
> helpful (especially your example)
>
> Best Regards
>
> --Iacovos
> On 3/27/19 10:42 PM, Luca Canali wrote:
>
> I find that the Spark metrics system is quite useful to gather resource
> utilization metrics of Spark applications, including CPU, memory and I/O.
>
> If you are interested an example how this works for us at:
> https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark
> If instead you are rather looking at ways to instrument your Spark code
> with performance metrics, Spark task metrics and event listeners are quite
> useful for that. See also
> https://github.com/apache/spark/blob/master/docs/monitoring.md and
> https://github.com/LucaCanali/sparkMeasure
>
>
>
> Regards,
>
> Luca
>
>
>
> *From:* manish ranjan  
> *Sent:* Tuesday, March 26, 2019 15:24
> *To:* Jack Kolokasis  
> *Cc:* user  
> *Subject:* Re: Spark Profiler
>
>
>
> I have found ganglia very helpful in understanding network I/o , CPU and
> memory usage  for a given spark cluster.
>
> I have not used , but have heard good things about Dr Elephant ( which I
> think was contributed by LinkedIn but not 100%sure).
>
>
>
> On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis 
> wrote:
>
> Hello all,
>
>  I am looking for a spark profiler to trace my application to find
> the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage.
>
> I am looking forward for your reply.
>
> --Iacovos
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark Profiler

2019-03-27 Thread Jack Kolokasis
Thanks for your reply.  Your help is very valuable and all these links 
are helpful (especially your example)


Best Regards

--Iacovos

On 3/27/19 10:42 PM, Luca Canali wrote:


I find that the Spark metrics system is quite useful to gather 
resource utilization metrics of Spark applications, including CPU, 
memory and I/O.


If you are interested an example how this works for us at: 
https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark 

If instead you are rather looking at ways to instrument your Spark 
code with performance metrics, Spark task metrics and event listeners 
are quite useful for that. See also 
https://github.com/apache/spark/blob/master/docs/monitoring.md and 
https://github.com/LucaCanali/sparkMeasure


Regards,

Luca

*From:*manish ranjan 
*Sent:* Tuesday, March 26, 2019 15:24
*To:* Jack Kolokasis 
*Cc:* user 
*Subject:* Re: Spark Profiler

I have found ganglia very helpful in understanding network I/o , CPU 
and memory usage  for a given spark cluster.


I have not used , but have heard good things about Dr Elephant ( which 
I think was contributed by LinkedIn but not 100%sure).


On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis <mailto:koloka...@ics.forth.gr>> wrote:


Hello all,

 I am looking for a spark profiler to trace my application to
find
the bottlenecks. I need to trace CPU usage, Memory Usage and I/O
usage.

I am looking forward for your reply.

--Iacovos


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>



RE: Spark Profiler

2019-03-27 Thread Luca Canali
I find that the Spark metrics system is quite useful to gather resource 
utilization metrics of Spark applications, including CPU, memory and I/O.
If you are interested an example how this works for us at: 
https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark
If instead you are rather looking at ways to instrument your Spark code with 
performance metrics, Spark task metrics and event listeners are quite useful 
for that. See also 
https://github.com/apache/spark/blob/master/docs/monitoring.md and 
https://github.com/LucaCanali/sparkMeasure

Regards,
Luca

From: manish ranjan 
Sent: Tuesday, March 26, 2019 15:24
To: Jack Kolokasis 
Cc: user 
Subject: Re: Spark Profiler

I have found ganglia very helpful in understanding network I/o , CPU and memory 
usage  for a given spark cluster.
I have not used , but have heard good things about Dr Elephant ( which I think 
was contributed by LinkedIn but not 100%sure).

On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis 
mailto:koloka...@ics.forth.gr>> wrote:
Hello all,

 I am looking for a spark profiler to trace my application to find
the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage.

I am looking forward for your reply.

--Iacovos


-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>


Re: Spark Profiler

2019-03-26 Thread manish ranjan
I have found ganglia very helpful in understanding network I/o , CPU and
memory usage  for a given spark cluster.
I have not used , but have heard good things about Dr Elephant ( which I
think was contributed by LinkedIn but not 100%sure).

On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis  wrote:

> Hello all,
>
>  I am looking for a spark profiler to trace my application to find
> the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage.
>
> I am looking forward for your reply.
>
> --Iacovos
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark profiler

2014-12-29 Thread Boromir Widas
It would be very helpful if there is any such tool, but the distributed
nature may be difficult to capture.

I had been trying to run a task where merging the accumulators was taking
an inordinately long time and was not reflecting in the standalone
cluster's web UI.
What I think will be useful is to publish metrics for different lifecycle
stages of a job to a system like Ganglia to zero in on bottlenecks. Perhaps
the user can define some of the metrics in the config.

I have been thinking of tinkering with the metrics publisher to add custom
metrics to get a bigger picture and time breakdown.

On Mon, Dec 29, 2014 at 10:24 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 Hi,

 I want to find the time taken for replicating an rdd in spark cluster
 along with the computation time on the replicated rdd.

 Can someone please suggest a suitable spark profiler?

 Thank you



Re: Spark profiler

2014-05-01 Thread Mayur Rustagi
Some thing like Twitter Ambrose would be lovely to integrate :)


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Thu, May 1, 2014 at 8:44 PM, Punya Biswal pbis...@palantir.com wrote:

 Hi all,

 I am thinking of starting work on a profiler for Spark clusters. The
 current idea is that it would collect jstacks from executor nodes and put
 them into a central index (either a database or elasticsearch), and it
 would present them to people in a UI that would let people slice and dice
 the jstacks based on what job was running at the time, and what executor
 node was running. In addition, the UI would also present time spent doing
 non-computational work, such as shuffling and input/output IO. In a future
 extension, we might support reading from JMX and/or a JVM agent to get more
 precise data.

 I know that it's already possible to use YourKit to profile individual
 processes, but YourKit costs money, needs a desktop client to be installed,
 and doesn't place its data in the context relevant to a Spark cluster.

 Does something like this already exist (or is such a project already in
 progress)? Do you have any feedback or recommendations for how to go about
 it?

 Thanks!
 Punya