Shuffle write will finally spill the data into file system as a bunch of files. If you want to avoid disk write, you can mount a ramdisk and configure "spark.local.dir" to this ram disk. So shuffle output will write to memory based FS, and will not introduce disk IO.
Thanks Jerry 2015-03-30 17:15 GMT+08:00 shahab <shahab.mok...@gmail.com>: > Hi, > > I was looking at SparkUI, Executors, and I noticed that I have 597 MB for > "Shuffle while I am using cached temp-table and the Spark had 2 GB free > memory (the number under Memory Used is 597 MB /2.6 GB) ?!!! > > Shouldn't be Shuffle Write be zero and everything (map/reduce) tasks be > done in memory? > > best, > > /Shahab >