Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-27 Thread Rohit Karlupia
Let me be more specific: With GC/CPU aware task scheduling, user doesn't have to worry about specifying cores carefully. So if the user always specify cores = 100 or 1024 for every executor, he will still not get OOM (under vast majority of cases). Internally, the scheduler will vary the number

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-26 Thread Fawze Abujaber
Thanks for the update. What about cores per executor? On Tue, 27 Mar 2018 at 6:45 Rohit Karlupia wrote: > Thanks Fawze! > > On the memory front, I am currently working on GC and CPU aware task > scheduling. I see wonderful results based on my tests so far. Once the >

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-26 Thread Rohit Karlupia
Thanks Fawze! On the memory front, I am currently working on GC and CPU aware task scheduling. I see wonderful results based on my tests so far. Once the feature is complete and available, spark will work with whatever memory is provided (at least enough for the largest possible task). It will

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-26 Thread Fawze Abujaber
Hi Rohit, I would like to thank you for the unlimited patience and support that you are providing here and behind the scene for all of us. The tool is amazing and easy to use and understand most of the metrics ... Thinking if we need to run it in cluster mode and all the time, i think we can

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-26 Thread Rohit Karlupia
Hi Shmuel, In general it is hard to pin point to exact code which is responsible for a specific stage. For example when using spark sql, depending upon the kind of joins, aggregations used in the the single line of query, we will have multiple stages in the spark application. I usually try to

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Shmuel Blitz
Hi Rohit, Thanks for the analysis. I can use repartition on the slow task. But how can I tell what part of the code is in charge of the slow tasks? It would be great if you could further explain the rest of the output. Thanks in advance, Shmuel On Sun, Mar 25, 2018 at 12:46 PM, Rohit Karlupia

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Rohit Karlupia
Thanks Shamuel for trying out sparklens! Couple of things that I noticed: 1) 250 executors is probably overkill for this job. It would run in same time with around 100. 2) Many of stages that take long time have only 200 tasks where as we have 750 cores available for the job. 200 is the default

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Shmuel Blitz
I ran it on a single job. SparkLens has an overhead on the job duration. I'm not ready to enable it by default on all our jobs. Attached is the output. Still trying to understand what exactly it means. On Sun, Mar 25, 2018 at 10:40 AM, Fawze Abujaber wrote: > Nice! > >

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Fawze Abujaber
Nice! Shmuel, Were you able to run on a cluster level or for a specific job? Did you configure it on the spark-default.conf? On Sun, 25 Mar 2018 at 10:34 Shmuel Blitz wrote: > Just to let you know, I have managed to run SparkLens on our cluster. > > I switched to

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Shmuel Blitz
Just to let you know, I have managed to run SparkLens on our cluster. I switched to the spark_1.6 branch, and also compiled against the specific image of Spark we are using (cdh5.7.6). Now I need to figure out what the output means... :P Shmuel On Fri, Mar 23, 2018 at 7:24 PM, Fawze Abujaber

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-23 Thread Fawze Abujaber
Quick question: how to add the --jars /path/to/sparklens_2.11-0.1.0.jar to the spark-default conf, should it be using: spark.driver.extraClassPath /path/to/sparklens_2.11-0.1.0.jar or i should use spark.jars option? anyone who could give an example how it should be, and if i the path for the

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-22 Thread Fawze Abujaber
Hi Shmuel, Did you compile the code against the right branch for Spark 1.6. I tested it and it looks working and now i'm testing the branch for a wide tests, Please use the branch for Spark 1.6 On Fri, Mar 23, 2018 at 12:43 AM, Shmuel Blitz wrote: > Hi Rohit, > >

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-22 Thread Shmuel Blitz
Hi Rohit, Thanks for sharing this great tool. I tried running a spark job with the tool, but it failed with an *IncompatibleClassChangeError *Exception. I have opened an issue on Github.( https://github.com/qubole/sparklens/issues/1) Shmuel On Thu, Mar 22, 2018 at 5:05 PM, Shmuel Blitz

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-22 Thread Shmuel Blitz
Thanks. We will give this a try and report back. Shmuel On Thu, Mar 22, 2018 at 4:22 PM, Rohit Karlupia wrote: > Thanks everyone! > Please share how it works and how it doesn't. Both help. > > Fawaze, just made few changes to make this work with spark 1.6. Can you > please

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-22 Thread Rohit Karlupia
Thanks everyone! Please share how it works and how it doesn't. Both help. Fawaze, just made few changes to make this work with spark 1.6. Can you please try building from branch *spark_1.6* thanks, rohitk On Thu, Mar 22, 2018 at 10:18 AM, Fawze Abujaber wrote: > It's

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread Fawze Abujaber
It's super amazing i see it was tested on spark 2.0.0 and above, what about Spark 1.6 which is still part of Cloudera's main versions? We have a vast Spark applications with version 1.6.0 On Thu, Mar 22, 2018 at 6:38 AM, Holden Karau wrote: > Super exciting! I look

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread Holden Karau
Super exciting! I look forward to digging through it this weekend. On Wed, Mar 21, 2018 at 9:33 PM ☼ R Nair (रविशंकर नायर) < ravishankar.n...@gmail.com> wrote: > Excellent. You filled a missing link. > > Best, > Passion > > On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia >

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread रविशंकर नायर
Excellent. You filled a missing link. Best, Passion On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia wrote: > Hi, > > Happy to announce the availability of Sparklens as open source project. It > helps in understanding the scalability limits of spark applications and > can

Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread Rohit Karlupia
Hi, Happy to announce the availability of Sparklens as open source project. It helps in understanding the scalability limits of spark applications and can be a useful guide on the path towards tuning applications for lower runtime or cost. Please clone from here: