Hello! In my spark job, I see that Shuffle Spill (Memory) is greater than Shuffle Spill (Disk). spark.shuffle.compress parameter is left to default(true?). I would expect the size on disk to be smaller which isn't the case here. I've been having some performance issues as well and I suspect this is somehow related to that.
All memory configuration parameters are default. I'm running spark 2.0. Shuffle Spill (Memory): 712.0 MB Shuffle Spill (Disk): 7.9 GB To my surprise, I also see the following for some tasks: Shuffle Spill (Memory): 0.0 B Shuffle Spill (Disk): 77.5 MB I would appreciate if anyone can explain this behavior. -Prayag