Re: why does shuffle in spark write shuffle data to disk by default?

Reynold Xin Mon, 23 Nov 2015 19:27:41 -0800

I think for most jobs the bottleneck isn't in writing shuffle data to disk,
since shuffle data needs to be "shuffled" and sent across the network.

You can always use a ramdisk yourself. Requiring ramdisk by default would
significantly complicate configuration and platform portability.

On Mon, Nov 23, 2015 at 5:36 PM, huan zhang <zhanghuan...@gmail.com> wrote:

> Hi All,
>     I'm wonderring why does shuffle in spark write shuffle data to disk by
> default?
>     In Stackoverflow, someone said it's used by FTS, but node down is the
> most common reason of fault, and write to disk cannot do FTS in this case
> either.
>     So why not use ramdisk as default instread of SDD or HDD only?
>
> Thanks
> Hubert Zhang
>

Re: why does shuffle in spark write shuffle data to disk by default?

Reply via email to