subject:"spark.shuffle.spill=false ignored\?"

Re: spark.shuffle.spill=false ignored?

2015-09-09 Thread Eric Walker

Hi Richard, I've stepped away from this issue since I raised my question. An additional detail that was unknown at the time was that not in every instance when the spilling to disk was encountered did the application run out of disk space; that problem appears to have been a one-off problem. The

Re: spark.shuffle.spill=false ignored?

2015-09-09 Thread Richard Marscher

Hi Eric, I just wanted to do a sanity check, do you know what paths it is trying to write to? I ask because even without spilling, shuffles always write to disk first before transferring data across the network. I had at one point encountered this myself where we accidentally had /tmp mounted on

spark.shuffle.spill=false ignored?

2015-09-03 Thread Eric Walker

Hi, I am using Spark 1.3.1 on EMR with lots of memory. I have attempted to run a large pyspark job several times, specifying `spark.shuffle.spill=false` in different ways. It seems that the setting is ignored, at least partially, and some of the tasks start spilling large amounts of data to