Hey Larry,

I don’t think Hadoop will put shuffle output in HDFS, instead it’s behavior is 
the same as what Spark did, store mapper output (shuffle) data on local disks. 
You might misunderstood something ☺.

Thanks
Jerry

From: Larry Liu [mailto:larryli...@gmail.com]
Sent: Monday, January 26, 2015 3:03 PM
To: Shao, Saisai
Cc: u...@spark.incubator.apache.org
Subject: Re: Shuffle to HDFS

Hi,Jerry

Thanks for your reply.

The reason I have this question is that in Hadoop, mapper intermediate output 
(shuffle) will be stored in HDFS. I think the default location for spark is 
/tmp I think.

Larry

On Sun, Jan 25, 2015 at 9:44 PM, Shao, Saisai 
<saisai.s...@intel.com<mailto:saisai.s...@intel.com>> wrote:
Hi Larry,

I don’t think current Spark’s shuffle can support HDFS as a shuffle output. 
Anyway, is there any specific reason to spill shuffle data to HDFS or NFS, this 
will severely increase the shuffle time.

Thanks
Jerry

From: Larry Liu [mailto:larryli...@gmail.com<mailto:larryli...@gmail.com>]
Sent: Sunday, January 25, 2015 4:45 PM
To: u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org>
Subject: Shuffle to HDFS

How to change shuffle output to HDFS or NFS?

Reply via email to