Marc, see my inline comments.

On Fri, Aug 24, 2012 at 4:09 PM, Marc Sturlese <marc.sturl...@gmail.com>wrote:

> Hey there,
> I have a doubt about reduce tasks and block writes. Do a reduce task always
> first write to hdfs in the node where they it is placed? (and then these
> blocks would be replicated to other nodes)
>

Yes, if there is a DN running on that server (it's possible to be running
TT without a DN).


> In case yes, if I have a cluster of 5 nodes, 4 of them run DN and TT and
> one
> (node A) just run DN, when running MR jobs, map tasks would never read from
> node A? This would be because maps have data locality and if the reduce
> tasks write first to the node where they live, one replica of the block
> would always be in a node that has a TT. Node A would just contain blocks
> created from replication by the framework as no reduce task would write
> there directly. Is this correct?
>

I believe that it's possible that a map task would read from node A's DN.
 Yes, the JobTracker tries to schedule map tasks on nodes where the data
would be local, but it can't always do so.  If there's a node with a free
map slot, but that node doesn't have the data blocks locally, the
JobTracker will assign the map task to that free map slot.  Some work done
(albeit slower than the ideal case because of the increased network IO) is
better than no work done.


> Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/doubt-about-reduce-tasks-and-block-writes-tp4003185.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>

Reply via email to