Hi Al M, You should try proving more main memory to shuffle process and it might reduce spill on disk. The default configuration for shuffle memory fraction is 20% of the safe memory that means 16% of the overall heap memory. so when we set executor memory only a small fraction of it is used in the shuffle process which induces more n more spillage on disk but great thing here, we can actually change that fraction and provide more memory to shuffle you just need to set two properties:
1 : set 'spark.storage.memoryFraction' to 0.4 which is by default 0.6 2 : set 'spark.shuffle.memoryFraction' to 0.4 which is by default 0.2 this should make a significant difference in disk use of shuffle. Thank you - Himanshu Mehra -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23334.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org