Hi Ted, This is interesting. I assume that hadoop1-hadoop11 are newly added nodes. Could you please provide me more information about your hdfs cluster? What's the topology of the cluster, i.e. how many racks it has and which machines belong to which rack? Were they added to the cluster at the same time or hadoop 10&11 were added later?
Hairong -----Original Message----- > From: Ted Dunning <[EMAIL PROTECTED]> > Reply-To: <hadoop-user@lucene.apache.org> > Date: Wed, 19 Sep 2007 19:46:14 -0700 > To: <hadoop-user@lucene.apache.org> > Conversation: Statistically bad distribution of blocks > Subject: Statistically bad distribution of blocks > > > I just added 10 datanodes to a small cluster and turned up the > replication on many of the files to balance the storage out a bit. > > I expected to see a uniform-ish distribution of blocks on the new nodes. > This is what I got instead: > > Node Last Contact State Size (GB) Used (%) Blocks > hadoop1 0 In Service 42.68 72.36 585 > hadoop10 1 In Service 42.68 50.30 354 > hadoop11 2 In Service 42.68 48.02 340 > hadoop2 2 In Service 42.68 73.01 597 > hadoop3 2 In Service 42.68 72.68 614 > hadoop6 0 In Service 42.68 72.87 578 > hadoop7 0 In Service 42.68 72.38 600 > hadoop8 2 In Service 42.68 72.30 593 > hadoop9 2 In Service 42.68 72.70 637 > metricsapp1 0 In Service 257.98 90.52 4134 > metricsapp2 0 In Service 257.98 40.23 2338 > metricsapp3 2 In Service 247.20 39.41 2889 > metricsapp4 2 In Service 257.98 98.44 5096 > > The right-most column is what we are interested in here. Note how > hadoop10 and hadoop11 have significantly fewer blocks than the others. > Statistically we should expect that the counts should vary less than > about > > 2 * sqrt(600) = 50 > > Indeed, most of them do. But those two do not. > > Is there some hidden significance in the names of nodes? > >