Hi Richard,
I've stepped away from this issue since I raised my question. An
additional detail that was unknown at the time was that not in every
instance when the spilling to disk was encountered did the application run
out of disk space; that problem appears to have been a one-off problem.
The
Hi Eric,
I just wanted to do a sanity check, do you know what paths it is trying to
write to? I ask because even without spilling, shuffles always write to
disk first before transferring data across the network. I had at one point
encountered this myself where we accidentally had /tmp mounted on
Hi,
I am using Spark 1.3.1 on EMR with lots of memory. I have attempted to run
a large pyspark job several times, specifying `spark.shuffle.spill=false`
in different ways. It seems that the setting is ignored, at least
partially, and some of the tasks start spilling large amounts of data to