Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-27 Thread Rohit Karlupia
lps. thanks, rohitk On Tue, Mar 27, 2018 at 9:20 AM, Fawze Abujaber <fawz...@gmail.com> wrote: > Thanks for the update. > > What about cores per executor? > > On Tue, 27 Mar 2018 at 6:45 Rohit Karlupia <roh...@qubole.com> wrote: > >> Thanks Fawze!

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-26 Thread Rohit Karlupia
ill run different from spark job with 1 exec and 3 > cores and for sure the same compare with different exec memory. > > Overall, it is so good starting point, but it will be a GAME CHANGER > getting these metrics on the tool. > > @Rohit , Huge THANY YOU > > On Mon, Mar 26

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-26 Thread Rohit Karlupia
u could further explain the rest of the output. > > Thanks in advance, > Shmuel > > On Sun, Mar 25, 2018 at 12:46 PM, Rohit Karlupia <roh...@qubole.com> > wrote: > >> Thanks Shamuel for trying out sparklens! >> >> Couple of things that I noticed: >>

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Rohit Karlupia
gt; >>>>> On Fri, Mar 23, 2018 at 12:43 AM, Shmuel Blitz < >>>>> shmuel.bl...@similarweb.com> wrote: >>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> Thanks for sharing this great tool. >>>>>

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-22 Thread Rohit Karlupia
rote: > >> Super exciting! I look forward to digging through it this weekend. >> >> On Wed, Mar 21, 2018 at 9:33 PM ☼ R Nair (रविशंकर नायर) < >> ravishankar.n...@gmail.com> wrote: >> >>> Excellent. You filled a missing link. >>> >>

Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread Rohit Karlupia
Hi, Happy to announce the availability of Sparklens as open source project. It helps in understanding the scalability limits of spark applications and can be a useful guide on the path towards tuning applications for lower runtime or cost. Please clone from here:

Spark Tuning Tool

2018-01-22 Thread Rohit Karlupia
for some interest in the community if people find this work interesting and would like to try to it out. thanks, Rohit Karlupia

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Rohit Karlupia
Here is the list that I will probably try to fill: 1. Check GC on the offending executor when the task is running. May be you need even more memory. 2. Go back to some previous successful run of the job and check the spark ui for the offending stage and check max task time/max

Re: org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated

2017-05-07 Thread Rohit Karlupia
Last time I checked, this happens only with Spark < 2.0.0. The reason is ServiceLoader used for loading all fileSystems from the classpath. In pre Spark < 2.0.0 tachyon.hadoop.TFS was packaged with Spark distribution and gets loaded irrespective of it being used or not. Moving to Spark 2.0.0+

Re: Setting Optimal Number of Spark Executor Instances

2017-03-15 Thread Rohit Karlupia
Number of tasks is very likely not the reason for getting timeouts. Few things to look for: What is actually timing out? What kind of operation? Writing/Reading to HSDF (NameNode or DataNode) or fetching shuffle data (External Shuffle Service or not) or driver is not able to talk to executor.

Re: spark sql jobs heap memory

2016-11-24 Thread Rohit Karlupia
Dataset/dataframes will use direct/raw/off-heap memory in the most efficient columnar fashion. Trying to fit the same amount of data in heap memory would likely increase your memory requirement and decrease the speed. So, in short, don't worry about it and increase overhead. You can also set a