Thanks Nilesh,
Thanks for sharing those docs. I have came across most of those tuning in
past and believe me I have tune the hack of out of this job. What I can't
beleive is spark needs 4x more resource then MapReduce to run the same job
(for dataset magnitude of >100GB).
I was able to run my job
Thanks Nilesh. I don't think there;s heavy communication between driver and
executor. However I'll try the settings you suggested.
I can not replace groupBy with reduceBy as its not an associative
operation.
It is very frustrating to be honest. It was a piece of cake with map reduce
compare to
; Regards
> Arun.
>
> From: Kuchekar [kuchekar.nil...@gmail.com]
> Sent: 11 February 2016 09:42
> To: Nirav Patel
> Cc: spark users
> Subject: Re: Spark execuotr Memory profiling
>
> Hi Nirav,
>
> I faced similar issue with Yarn, EM
Hi Nirav,
I faced similar issue with Yarn, EMR 1.5.2 and following
Spark Conf helped me. You can set the values accordingly
conf= (SparkConf().set("spark.master","yarn-client").setAppName("HalfWay"
).set("spark.driver.memory", "15G").set("spark.yarn.am.memory","15G"))
We have been trying to solve memory issue with a spark job that processes
150GB of data (on disk). It does a groupBy operation; some of the executor
will receive somehwere around (2-4M scala case objects) to work with. We
are using following spark config:
"executorInstances": "15",