Harsh I did leave an escape route open witth a bit about "corner cases" :-)
Anyway I agree that HDFS has no notion of block 0. I just meant that had the dfs.replication is 1, there will be,under normal circumstances :-), no blocks of output file will be written to node A. Raj ----- Original Message ----- > From: Harsh J <ha...@cloudera.com> > To: common-user@hadoop.apache.org; Raj Vishwanathan <rajv...@yahoo.com> > Cc: > Sent: Saturday, August 25, 2012 4:02 AM > Subject: Re: doubt about reduce tasks and block writes > > Raj's almost right. In times of high load or space fillup on a local > DN, the NameNode may decide to instead pick a non-local DN for > replica-writing. In this way, the Node A may get a "copy 0" of a > replica from a task. This is per the default block placement policy. > > P.s. Note that HDFS hardly makes any differences between replicas, > hence there is no hard-concept of a "copy 0" or "copy 1" > block, at the > NN level, it treats all DNs in pipeline equally and same for replicas. > > On Sat, Aug 25, 2012 at 4:14 AM, Raj Vishwanathan <rajv...@yahoo.com> > wrote: >> But since node A has no TT running, it will not run map or reduce tasks. > When the reducer node writes the output file, the fist block will be written > on > the local node and never on node A. >> >> So, to answer the question, Node A will contain copies of blocks of all > output files. It wont contain the copy 0 of any output file. >> >> >> I am reasonably sure about this , but there could be corner cases in case > of node failure and such like! I need to look into the code. >> >> >> Raj >>> ________________________________ >>> From: Marc Sturlese <marc.sturl...@gmail.com> >>> To: hadoop-u...@lucene.apache.org >>> Sent: Friday, August 24, 2012 1:09 PM >>> Subject: doubt about reduce tasks and block writes >>> >>> Hey there, >>> I have a doubt about reduce tasks and block writes. Do a reduce task > always >>> first write to hdfs in the node where they it is placed? (and then these >>> blocks would be replicated to other nodes) >>> In case yes, if I have a cluster of 5 nodes, 4 of them run DN and TT and > one >>> (node A) just run DN, when running MR jobs, map tasks would never read > from >>> node A? This would be because maps have data locality and if the reduce >>> tasks write first to the node where they live, one replica of the block >>> would always be in a node that has a TT. Node A would just contain > blocks >>> created from replication by the framework as no reduce task would write >>> there directly. Is this correct? >>> Thanks in advance >>> >>> >>> >>> -- >>> View this message in context: > http://lucene.472066.n3.nabble.com/doubt-about-reduce-tasks-and-block-writes-tp4003185.html >>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com. >>> >>> >>> > > > > -- > Harsh J >