Each node is configured to run 8map tasks. I am using 2.4 GHz 64-bit Quad Core Xeon using machines.
-Virajith On Tue, Jul 12, 2011 at 2:05 PM, Sudharsan Sampath <sudha...@gmail.com>wrote: > what's the map task capacity of each node ? > > On Tue, Jul 12, 2011 at 6:15 PM, Virajith Jalaparti > <virajit...@gmail.com>wrote: > >> Hi, >> >> I was trying to run the Sort example in Hadoop-0.20.2 over 200GB of input >> data using a 20 node cluster of nodes. HDFS is configured to use 128MB block >> size (so 1600maps are created) and a replication factor of 1 is being used. >> All the 20 nodes are also hdfs datanodes. I was using a bandwidth value of >> 50Mbps between each of the nodes (this was configured using linux "tc"). I >> see that around 90% of the map tasks are reading data over the network i.e. >> most of the map tasks are not being scheduled at the nodes where the data to >> be processed by them is located. >> My understanding was that Hadoop tries to schedule as many data-local maps >> as possible. But in this situation, this does not seem to happen. Any reason >> why this is happening? and is there a way to actually configure hadoop to >> ensure the maximum possible node locality? >> Any help regarding this is very much appreciated. >> >> Thanks, >> Virajith >> > >