I'm running a Spark Streaming job on 1.3.1 which contains an
updateStateByKey.  The job works perfectly fine, but at some point (after a
few runs), it starts shuffling to disk no matter how much memory I give the
executors.

I have tried changing --executor-memory on
spark-submit, spark.shuffle.memoryFraction, spark.storage.memoryFraction,
and spark.storage.unrollFraction.  But no matter how I configure these, it
always spills to disk around 2.5GB.

What is the best way to avoid spilling shuffle to disk?

Reply via email to