How to avoid Spark shuffle spill memory?

unk1102 Tue, 06 Oct 2015 12:20:07 -0700

Hi I have a Spark job which runs for around 4 hours and it shared
SparkContext and runs many child jobs. When I see each job in UI I see
shuffle spill of around 30 to 40 GB and because of that many times executors
gets lost because of using physical memory beyond limits how do I avoid
shuffle spill? I have tried almost all optimisations nothing is helping I
dont cache anything I am using Spark 1.4.1 and also using tungsten,codegen
etc  I am using spark.shuffle.storage as 0.2 and spark.storage.memory as 0.2
I tried to increase shuffle memory to 0.6 but then it halts in GC pause
causing my executor to timeout and then getting lost eventually.


Please guide. Thanks in advance.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-Spark-shuffle-spill-memory-tp24960.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

How to avoid Spark shuffle spill memory?

Reply via email to