Hi Al M,

You should try proving more main memory to shuffle process and it might
reduce spill on disk. The default configuration for shuffle memory fraction
is 20% of the safe memory that means 16% of the overall heap memory. so when
we set executor memory only a small fraction of it is used in the shuffle
process which induces more n more spillage on disk but great thing here, we
can actually change that fraction and provide more memory to shuffle you
just need to set two properties: 

1 : set 'spark.storage.memoryFraction' to 0.4 which is by default 0.6

2 : set 'spark.shuffle.memoryFraction' to 0.4 which is by default 0.2

this should make a significant difference in disk use of shuffle.

Thank you

-
Himanshu Mehra



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23334.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to