subject:"Re\: Shuffle to HDFS"

Re: RE: Shuffle to HDFS

2015-01-26 Thread bit1...@163.com

I have also thought that Hadoop mapper output result is saved on HDFS, at least 
if the job only has Mapper but doesn't have Reducer.
If there is reducer, then the map output will be saved on local disk?

From: Shao, Saisai
Date: 2015-01-26 15:23
To: Larry Liu
CC: u...@spark.incubator.apache.org
Subject: RE: Shuffle to HDFS
Hey Larry,

I don’t think Hadoop will put shuffle output in HDFS, instead it’s behavior is 
the same as what Spark did, store mapper output (shuffle) data on local disks. 
You might misunderstood something J.

Thanks
Jerry

From: Larry Liu [mailto:larryli...@gmail.com] 
Sent: Monday, January 26, 2015 3:03 PM
To: Shao, Saisai
Cc: u...@spark.incubator.apache.org
Subject: Re: Shuffle to HDFS

Hi,Jerry

Thanks for your reply.

The reason I have this question is that in Hadoop, mapper intermediate output 
(shuffle) will be stored in HDFS. I think the default location for spark is 
/tmp I think. 

Larry

On Sun, Jan 25, 2015 at 9:44 PM, Shao, Saisai saisai.s...@intel.com wrote:
Hi Larry,

I don’t think current Spark’s shuffle can support HDFS as a shuffle output. 
Anyway, is there any specific reason to spill shuffle data to HDFS or NFS, this 
will severely increase the shuffle time.

Thanks
Jerry

From: Larry Liu [mailto:larryli...@gmail.com] 
Sent: Sunday, January 25, 2015 4:45 PM
To: u...@spark.incubator.apache.org
Subject: Shuffle to HDFS

How to change shuffle output to HDFS or NFS?

Re: RE: Shuffle to HDFS

2015-01-26 Thread Sean Owen

If there is no Reducer, there is no shuffle. The Mapper output goes to
HDFS, yes. But the question here is about shuffle files, right? Those
are written by the Mapper to local disk. Reducers load them from the
Mappers over the network then. Shuffle files do not go to HDFS.

On Mon, Jan 26, 2015 at 10:01 AM, bit1...@163.com bit1...@163.com wrote:
 I have also thought that Hadoop mapper output result is saved on HDFS, at
 least if the job only has Mapper but doesn't have Reducer.
 If there is reducer, then the map output will be saved on local disk?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: Shuffle to HDFS

2015-01-25 Thread Shao, Saisai

Hi Larry,

I don’t think current Spark’s shuffle can support HDFS as a shuffle output. 
Anyway, is there any specific reason to spill shuffle data to HDFS or NFS, this 
will severely increase the shuffle time.

Thanks
Jerry

From: Larry Liu [mailto:larryli...@gmail.com]
Sent: Sunday, January 25, 2015 4:45 PM
To: u...@spark.incubator.apache.org
Subject: Shuffle to HDFS

How to change shuffle output to HDFS or NFS?

Re: Shuffle to HDFS

2015-01-25 Thread Larry Liu

Hi,Jerry

Thanks for your reply.

The reason I have this question is that in Hadoop, mapper intermediate
output (shuffle) will be stored in HDFS. I think the default location for
spark is /tmp I think.

Larry

On Sun, Jan 25, 2015 at 9:44 PM, Shao, Saisai saisai.s...@intel.com wrote:

  Hi Larry,



 I don’t think current Spark’s shuffle can support HDFS as a shuffle
 output. Anyway, is there any specific reason to spill shuffle data to HDFS
 or NFS, this will severely increase the shuffle time.



 Thanks

 Jerry



 *From:* Larry Liu [mailto:larryli...@gmail.com]
 *Sent:* Sunday, January 25, 2015 4:45 PM
 *To:* u...@spark.incubator.apache.org
 *Subject:* Shuffle to HDFS



 How to change shuffle output to HDFS or NFS?

RE: Shuffle to HDFS

2015-01-25 Thread Shao, Saisai

Hey Larry,

I don’t think Hadoop will put shuffle output in HDFS, instead it’s behavior is 
the same as what Spark did, store mapper output (shuffle) data on local disks. 
You might misunderstood something ☺.

Thanks
Jerry

From: Larry Liu [mailto:larryli...@gmail.com]
Sent: Monday, January 26, 2015 3:03 PM
To: Shao, Saisai
Cc: u...@spark.incubator.apache.org
Subject: Re: Shuffle to HDFS

Hi,Jerry

Thanks for your reply.

The reason I have this question is that in Hadoop, mapper intermediate output 
(shuffle) will be stored in HDFS. I think the default location for spark is 
/tmp I think.

Larry

On Sun, Jan 25, 2015 at 9:44 PM, Shao, Saisai 
saisai.s...@intel.commailto:saisai.s...@intel.com wrote:
Hi Larry,

I don’t think current Spark’s shuffle can support HDFS as a shuffle output. 
Anyway, is there any specific reason to spill shuffle data to HDFS or NFS, this 
will severely increase the shuffle time.

Thanks
Jerry

From: Larry Liu [mailto:larryli...@gmail.commailto:larryli...@gmail.com]
Sent: Sunday, January 25, 2015 4:45 PM
To: u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org
Subject: Shuffle to HDFS

How to change shuffle output to HDFS or NFS?

Re: RE: Shuffle to HDFS

Re: RE: Shuffle to HDFS

RE: Shuffle to HDFS

Re: Shuffle to HDFS

RE: Shuffle to HDFS

5 matches

Site Navigation

Mail list logo

Footer information