Re: Spark execuotr Memory profiling

2016-03-01 Thread Nirav Patel
Thanks Nilesh, Thanks for sharing those docs. I have came across most of those tuning in past and believe me I have tune the hack of out of this job. What I can't beleive is spark needs 4x more resource then MapReduce to run the same job (for dataset magnitude of >100GB). I was able to run my job

Re: Spark execuotr Memory profiling

2016-02-20 Thread Nirav Patel
Thanks Nilesh. I don't think there;s heavy communication between driver and executor. However I'll try the settings you suggested. I can not replace groupBy with reduceBy as its not an associative operation. It is very frustrating to be honest. It was a piece of cake with map reduce compare to

Re: Spark execuotr Memory profiling

2016-02-11 Thread Rishabh Wadhawan
; Regards > Arun. > > From: Kuchekar [kuchekar.nil...@gmail.com] > Sent: 11 February 2016 09:42 > To: Nirav Patel > Cc: spark users > Subject: Re: Spark execuotr Memory profiling > > Hi Nirav, > > I faced similar issue with Yarn, EM

Re: Spark execuotr Memory profiling

2016-02-10 Thread Kuchekar
Hi Nirav, I faced similar issue with Yarn, EMR 1.5.2 and following Spark Conf helped me. You can set the values accordingly conf= (SparkConf().set("spark.master","yarn-client").setAppName("HalfWay" ).set("spark.driver.memory", "15G").set("spark.yarn.am.memory","15G"))

Spark execuotr Memory profiling

2016-02-10 Thread Nirav Patel
We have been trying to solve memory issue with a spark job that processes 150GB of data (on disk). It does a groupBy operation; some of the executor will receive somehwere around (2-4M scala case objects) to work with. We are using following spark config: "executorInstances": "15",