As an addendum, running a DataNode on the same machine as a NameNode is generally considered a bad idea because it hurts the NameNode's ability to maintain high throughput.
- Aaron On Thu, Jun 18, 2009 at 1:26 PM, Aaron Kimball <aa...@cloudera.com> wrote: > Did you run the dfs put commands from the master node? If you're inserting > into HDFS from a machine running a DataNode, the local datanode will always > be chosen as one of the three replica targets. For more balanced loading, > you should use an off-cluster machine as the point of origin. > > If you experience uneven block distribution, you should also periodically > rebalance your cluster by running bin/start-balancer.sh every so often. It > will work in the background to move blocks from heavily-laden nodes to > underutilized ones. > > - Aaron > > > On Thu, Jun 18, 2009 at 12:57 PM, openresearch < > qiming...@openresearchinc.com> wrote: > >> >> Hi all >> >> I "dfs put" a large dataset onto a 10-node cluster. >> >> When I observe the Hadoop progress (via web:50070) and each local file >> system (via df -k), >> I notice that my master node is hit 5-10 times harder than others, so hard >> drive is get full quicker than others. Last night load, it actually crash >> when hard drive was full. >> >> To my understand, data should wrap around all nodes evenly (in a >> round-robin fashion using 64M as a unit). >> >> Is it expected behavior of Hadoop? Can anyone suggest a good >> troubleshooting >> way? >> >> Thanks >> >> >> -- >> View this message in context: >> http://www.nabble.com/HDFS-is-not-loading-evenly-across-all-nodes.-tp24099585p24099585.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> >> >