Re: spark.shuffle.spill=false ignored?

2015-09-09 Thread Eric Walker
Hi Richard, I've stepped away from this issue since I raised my question. An additional detail that was unknown at the time was that not in every instance when the spilling to disk was encountered did the application run out of disk space; that problem appears to have been a one-off problem. The

Re: spark.shuffle.spill=false ignored?

2015-09-09 Thread Richard Marscher
Hi Eric, I just wanted to do a sanity check, do you know what paths it is trying to write to? I ask because even without spilling, shuffles always write to disk first before transferring data across the network. I had at one point encountered this myself where we accidentally had /tmp mounted on

spark.shuffle.spill=false ignored?

2015-09-03 Thread Eric Walker
Hi, I am using Spark 1.3.1 on EMR with lots of memory. I have attempted to run a large pyspark job several times, specifying `spark.shuffle.spill=false` in different ways. It seems that the setting is ignored, at least partially, and some of the tasks start spilling large amounts of data to