If there is no Reducer, there is no shuffle. The Mapper output goes to
HDFS, yes. But the question here is about shuffle files, right? Those
are written by the Mapper to local disk. Reducers load them from the
Mappers over the network then. Shuffle files do not go to HDFS.
On Mon, Jan 26, 2015 at
CC: u...@spark.incubator.apache.org
Subject: RE: Shuffle to HDFS
Hey Larry,
I don’t think Hadoop will put shuffle output in HDFS, instead it’s behavior is
the same as what Spark did, store mapper output (shuffle) data on local disks.
You might misunderstood something J.
Thanks
Jerry
From: Larry Liu [mailto:la
, 2015 3:03 PM
To: Shao, Saisai
Cc: u...@spark.incubator.apache.org
Subject: Re: Shuffle to HDFS
Hi,Jerry
Thanks for your reply.
The reason I have this question is that in Hadoop, mapper intermediate output
(shuffle) will be stored in HDFS. I think the default location for spark is
/tmp I think
Hi,Jerry
Thanks for your reply.
The reason I have this question is that in Hadoop, mapper intermediate
output (shuffle) will be stored in HDFS. I think the default location for
spark is /tmp I think.
Larry
On Sun, Jan 25, 2015 at 9:44 PM, Shao, Saisai wrote:
> Hi Larry,
>
>
>
> I don’t think
Hi Larry,
I don’t think current Spark’s shuffle can support HDFS as a shuffle output.
Anyway, is there any specific reason to spill shuffle data to HDFS or NFS, this
will severely increase the shuffle time.
Thanks
Jerry
From: Larry Liu [mailto:larryli...@gmail.com]
Sent: Sunday, January 25, 20