Re: Spark Profiler
Hello Jack, You can also have a look at “Babar”, there is a nice “flame graph” feature too. I haven’t had the time to test it out. https://github.com/criteo/babar JC -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark Profiler
Hi Jack, You can try sparklens (https://github.com/qubole/sparklens). I think it won't give details at as low a level as you're looking for, but it can help you identify and remove performance bottlenecks. ~ Hariharan On Fri, Mar 29, 2019 at 12:01 AM bo yang wrote: > Yeah, these options are very valuable. Just add another option :) We build > a jvm profiler (https://github.com/uber-common/jvm-profiler) to monitor > and profile Spark applications in large scale (e.g. sending metrics to > kafka / hive for batch analysis). People could try it as well. > > > On Wed, Mar 27, 2019 at 1:49 PM Jack Kolokasis > wrote: > >> Thanks for your reply. Your help is very valuable and all these links >> are helpful (especially your example) >> >> Best Regards >> >> --Iacovos >> On 3/27/19 10:42 PM, Luca Canali wrote: >> >> I find that the Spark metrics system is quite useful to gather resource >> utilization metrics of Spark applications, including CPU, memory and I/O. >> >> If you are interested an example how this works for us at: >> https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark >> If instead you are rather looking at ways to instrument your Spark code >> with performance metrics, Spark task metrics and event listeners are quite >> useful for that. See also >> https://github.com/apache/spark/blob/master/docs/monitoring.md and >> https://github.com/LucaCanali/sparkMeasure >> >> >> >> Regards, >> >> Luca >> >> >> >> *From:* manish ranjan >> *Sent:* Tuesday, March 26, 2019 15:24 >> *To:* Jack Kolokasis >> *Cc:* user >> *Subject:* Re: Spark Profiler >> >> >> >> I have found ganglia very helpful in understanding network I/o , CPU and >> memory usage for a given spark cluster. >> >> I have not used , but have heard good things about Dr Elephant ( which I >> think was contributed by LinkedIn but not 100%sure). >> >> >> >> On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis >> wrote: >> >> Hello all, >> >> I am looking for a spark profiler to trace my application to find >> the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage. >> >> I am looking forward for your reply. >> >> --Iacovos >> >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>
Re: Spark Profiler
Yeah, these options are very valuable. Just add another option :) We build a jvm profiler (https://github.com/uber-common/jvm-profiler) to monitor and profile Spark applications in large scale (e.g. sending metrics to kafka / hive for batch analysis). People could try it as well. On Wed, Mar 27, 2019 at 1:49 PM Jack Kolokasis wrote: > Thanks for your reply. Your help is very valuable and all these links are > helpful (especially your example) > > Best Regards > > --Iacovos > On 3/27/19 10:42 PM, Luca Canali wrote: > > I find that the Spark metrics system is quite useful to gather resource > utilization metrics of Spark applications, including CPU, memory and I/O. > > If you are interested an example how this works for us at: > https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark > If instead you are rather looking at ways to instrument your Spark code > with performance metrics, Spark task metrics and event listeners are quite > useful for that. See also > https://github.com/apache/spark/blob/master/docs/monitoring.md and > https://github.com/LucaCanali/sparkMeasure > > > > Regards, > > Luca > > > > *From:* manish ranjan > *Sent:* Tuesday, March 26, 2019 15:24 > *To:* Jack Kolokasis > *Cc:* user > *Subject:* Re: Spark Profiler > > > > I have found ganglia very helpful in understanding network I/o , CPU and > memory usage for a given spark cluster. > > I have not used , but have heard good things about Dr Elephant ( which I > think was contributed by LinkedIn but not 100%sure). > > > > On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis > wrote: > > Hello all, > > I am looking for a spark profiler to trace my application to find > the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage. > > I am looking forward for your reply. > > --Iacovos > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Spark Profiler
Thanks for your reply. Your help is very valuable and all these links are helpful (especially your example) Best Regards --Iacovos On 3/27/19 10:42 PM, Luca Canali wrote: I find that the Spark metrics system is quite useful to gather resource utilization metrics of Spark applications, including CPU, memory and I/O. If you are interested an example how this works for us at: https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark If instead you are rather looking at ways to instrument your Spark code with performance metrics, Spark task metrics and event listeners are quite useful for that. See also https://github.com/apache/spark/blob/master/docs/monitoring.md and https://github.com/LucaCanali/sparkMeasure Regards, Luca *From:*manish ranjan *Sent:* Tuesday, March 26, 2019 15:24 *To:* Jack Kolokasis *Cc:* user *Subject:* Re: Spark Profiler I have found ganglia very helpful in understanding network I/o , CPU and memory usage for a given spark cluster. I have not used , but have heard good things about Dr Elephant ( which I think was contributed by LinkedIn but not 100%sure). On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis <mailto:koloka...@ics.forth.gr>> wrote: Hello all, I am looking for a spark profiler to trace my application to find the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage. I am looking forward for your reply. --Iacovos - To unsubscribe e-mail: user-unsubscr...@spark.apache.org <mailto:user-unsubscr...@spark.apache.org>
RE: Spark Profiler
I find that the Spark metrics system is quite useful to gather resource utilization metrics of Spark applications, including CPU, memory and I/O. If you are interested an example how this works for us at: https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark If instead you are rather looking at ways to instrument your Spark code with performance metrics, Spark task metrics and event listeners are quite useful for that. See also https://github.com/apache/spark/blob/master/docs/monitoring.md and https://github.com/LucaCanali/sparkMeasure Regards, Luca From: manish ranjan Sent: Tuesday, March 26, 2019 15:24 To: Jack Kolokasis Cc: user Subject: Re: Spark Profiler I have found ganglia very helpful in understanding network I/o , CPU and memory usage for a given spark cluster. I have not used , but have heard good things about Dr Elephant ( which I think was contributed by LinkedIn but not 100%sure). On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis mailto:koloka...@ics.forth.gr>> wrote: Hello all, I am looking for a spark profiler to trace my application to find the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage. I am looking forward for your reply. --Iacovos - To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
Re: Spark Profiler
I have found ganglia very helpful in understanding network I/o , CPU and memory usage for a given spark cluster. I have not used , but have heard good things about Dr Elephant ( which I think was contributed by LinkedIn but not 100%sure). On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis wrote: > Hello all, > > I am looking for a spark profiler to trace my application to find > the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage. > > I am looking forward for your reply. > > --Iacovos > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Spark profiler
It would be very helpful if there is any such tool, but the distributed nature may be difficult to capture. I had been trying to run a task where merging the accumulators was taking an inordinately long time and was not reflecting in the standalone cluster's web UI. What I think will be useful is to publish metrics for different lifecycle stages of a job to a system like Ganglia to zero in on bottlenecks. Perhaps the user can define some of the metrics in the config. I have been thinking of tinkering with the metrics publisher to add custom metrics to get a bigger picture and time breakdown. On Mon, Dec 29, 2014 at 10:24 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi, I want to find the time taken for replicating an rdd in spark cluster along with the computation time on the replicated rdd. Can someone please suggest a suitable spark profiler? Thank you
Re: Spark profiler
Some thing like Twitter Ambrose would be lovely to integrate :) Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Thu, May 1, 2014 at 8:44 PM, Punya Biswal pbis...@palantir.com wrote: Hi all, I am thinking of starting work on a profiler for Spark clusters. The current idea is that it would collect jstacks from executor nodes and put them into a central index (either a database or elasticsearch), and it would present them to people in a UI that would let people slice and dice the jstacks based on what job was running at the time, and what executor node was running. In addition, the UI would also present time spent doing non-computational work, such as shuffling and input/output IO. In a future extension, we might support reading from JMX and/or a JVM agent to get more precise data. I know that it's already possible to use YourKit to profile individual processes, but YourKit costs money, needs a desktop client to be installed, and doesn't place its data in the context relevant to a Spark cluster. Does something like this already exist (or is such a project already in progress)? Do you have any feedback or recommendations for how to go about it? Thanks! Punya