I think for most jobs the bottleneck isn't in writing shuffle data to disk, since shuffle data needs to be "shuffled" and sent across the network.
You can always use a ramdisk yourself. Requiring ramdisk by default would significantly complicate configuration and platform portability. On Mon, Nov 23, 2015 at 5:36 PM, huan zhang <[email protected]> wrote: > Hi All, > I'm wonderring why does shuffle in spark write shuffle data to disk by > default? > In Stackoverflow, someone said it's used by FTS, but node down is the > most common reason of fault, and write to disk cannot do FTS in this case > either. > So why not use ramdisk as default instread of SDD or HDD only? > > Thanks > Hubert Zhang >
