RE: Statistically bad distribution of blocks

Hairong Kuang Mon, 24 Sep 2007 17:41:51 -0700

Hi Ted,

This is interesting. I assume that hadoop1-hadoop11 are newly added nodes.
Could you please provide me more information about your hdfs cluster? What's
the topology of the cluster, i.e. how many racks it has and which machines
belong to which rack? Were they added to the cluster at the same time or
hadoop 10&11 were added later?


Hairong

-----Original Message-----
> From: Ted Dunning <[EMAIL PROTECTED]>
> Reply-To: <hadoop-user@lucene.apache.org>
> Date: Wed, 19 Sep 2007 19:46:14 -0700
> To: <hadoop-user@lucene.apache.org>
> Conversation: Statistically bad distribution of blocks
> Subject: Statistically bad distribution of blocks
> 
> 
> I just added 10 datanodes to a small cluster and turned up the 
> replication on many of the files to balance the storage out a bit.
> 
> I expected to see a uniform-ish distribution of blocks on the new nodes.
> This is what I got instead:
> 
>  Node    Last Contact State Size (GB)      Used (%)  Blocks
> hadoop1        0    In Service     42.68    72.36    585
> hadoop10       1    In Service     42.68    50.30    354
> hadoop11       2    In Service     42.68    48.02    340
> hadoop2        2    In Service     42.68    73.01    597
> hadoop3        2    In Service     42.68    72.68    614
> hadoop6        0    In Service     42.68    72.87    578
> hadoop7        0    In Service     42.68    72.38    600
> hadoop8        2    In Service     42.68    72.30    593
> hadoop9        2    In Service     42.68    72.70    637
> metricsapp1    0    In Service    257.98    90.52    4134
> metricsapp2    0    In Service    257.98    40.23    2338
> metricsapp3    2    In Service    247.20    39.41    2889
> metricsapp4    2    In Service    257.98    98.44    5096
> 
> The right-most column is what we are interested in here.  Note how 
> hadoop10 and hadoop11 have significantly fewer blocks than the others.  
> Statistically we should expect that the counts should vary less than 
> about
> 
>   2 * sqrt(600) = 50
> 
> Indeed, most of them do.  But those two do not.
> 
> Is there some hidden significance in the names of nodes?
> 
>

RE: Statistically bad distribution of blocks

Reply via email to