Each node is configured to run 8map tasks. I am using 2.4 GHz 64-bit Quad
Core Xeon using machines.

-Virajith

On Tue, Jul 12, 2011 at 2:05 PM, Sudharsan Sampath <sudha...@gmail.com>wrote:

> what's the map task capacity of each node ?
>
> On Tue, Jul 12, 2011 at 6:15 PM, Virajith Jalaparti 
> <virajit...@gmail.com>wrote:
>
>> Hi,
>>
>> I was trying to run the Sort example in Hadoop-0.20.2 over 200GB of input
>> data using a 20 node cluster of nodes. HDFS is configured to use 128MB block
>> size (so 1600maps are created) and a replication factor of 1 is being used.
>> All the 20 nodes are also hdfs datanodes. I was using a bandwidth value of
>> 50Mbps between each of the nodes (this was configured using linux "tc"). I
>> see that around 90% of the map tasks are reading data over the network i.e.
>> most of the map tasks are not being scheduled at the nodes where the data to
>> be processed by them is located.
>> My understanding was that Hadoop tries to schedule as many data-local maps
>> as possible. But in this situation, this does not seem to happen. Any reason
>> why this is happening? and is there a way to actually configure hadoop to
>> ensure the maximum possible node locality?
>> Any help regarding this is very much appreciated.
>>
>> Thanks,
>> Virajith
>>
>
>

Reply via email to