Re: RE: Shuffle to HDFS

2015-01-26 Thread bit1...@163.com
...@spark.incubator.apache.org Subject: RE: Shuffle to HDFS Hey Larry, I don’t think Hadoop will put shuffle output in HDFS, instead it’s behavior is the same as what Spark did, store mapper output (shuffle) data on local disks. You might misunderstood something J. Thanks Jerry From: Larry Liu [mailto:larryli

Re: RE: Shuffle to HDFS

2015-01-26 Thread Sean Owen
If there is no Reducer, there is no shuffle. The Mapper output goes to HDFS, yes. But the question here is about shuffle files, right? Those are written by the Mapper to local disk. Reducers load them from the Mappers over the network then. Shuffle files do not go to HDFS. On Mon, Jan 26, 2015 at

RE: Shuffle to HDFS

2015-01-25 Thread Shao, Saisai
Hi Larry, I don’t think current Spark’s shuffle can support HDFS as a shuffle output. Anyway, is there any specific reason to spill shuffle data to HDFS or NFS, this will severely increase the shuffle time. Thanks Jerry From: Larry Liu [mailto:larryli...@gmail.com] Sent: Sunday, January 25

Re: Shuffle to HDFS

2015-01-25 Thread Larry Liu
, I don’t think current Spark’s shuffle can support HDFS as a shuffle output. Anyway, is there any specific reason to spill shuffle data to HDFS or NFS, this will severely increase the shuffle time. Thanks Jerry *From:* Larry Liu [mailto:larryli...@gmail.com] *Sent:* Sunday, January

RE: Shuffle to HDFS

2015-01-25 Thread Shao, Saisai
Hey Larry, I don’t think Hadoop will put shuffle output in HDFS, instead it’s behavior is the same as what Spark did, store mapper output (shuffle) data on local disks. You might misunderstood something ☺. Thanks Jerry From: Larry Liu [mailto:larryli...@gmail.com] Sent: Monday, January 26