Hi I have Spark job and its executors hits OOM issue after some time and my
job hangs because of it followed by couple of IOException, Rpc client
disassociated, shuffle not found etc

I have tried almost everything dont know how do I solve this OOM issue
please guide I am fed up now. Here what I tried but nothing worked

-I tried 60 executors with each executor having 12 Gig/2 core
-I tried 30 executors with each executor having 20 Gig/2 core
-I tried 40 executors with each executor having 30 Gig/6 core (I also tried
7 and 8 core)
-I tried to set spark.storage.memoryFraction to 0.2 in order to solve OOM
issue I also tried to set it 0.0
-I tried to set spark.shuffle.memoryFraction to 0.4 since I need more
shuffling memory
-I tried to set spark.default.parallelism to 500,1000,1500 but it did not
help avoid OOM what is the ideal value for it?
-I also tried to set spark.sql.shuffle.partitions to 500 but it did not help
it just creates 500 output part files. Please make me understand difference
between spark.default.parallelism and spark.sql.shuffle.partitions.

My data is skewed but not that much large I dont understand why it is
hitting OOM I dont cache anything I jsut have four group by queries I am
calling using hivecontext.sql(). I have around 1000 threads which I spawn
from driver and each thread will execute these four queries.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-OOM-issue-on-YARN-tp24522.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to