Actually I was assuming that the entire cluster participates in the
rebalancing.  Repication is not done disk-wise in hadoop but block-wise.

On Wednesday, August 10, 2011, Rajiv Chittajallu <raj...@yahoo-inc.com>
wrote:
> Ted Dunning wrote on 08/10/11 at 10:40:30 -0700:
>>To be specific, taking a 100 node x 10 disk x 2 TB configuration with
drive
>>MTBF of 1000 days, we should be seeing drive failures on average once per
>>day.  With 1G ethernet and 30MB/s/node dedicated to re-replication, it
will
>>just over 10 minutes to restore replication of a single drive and will
take
>>just over 100 minutes to restore replication of an entire machine.
>
> You are assuming that only one good node is used to restore replication
for
> all the blocks on the failed drive. Which is very unlikely. With
> replication factor of 3, you will have at least 2 nodes to choose from
> in the worst case and much more in a standard cluster.
>
> And when you are having more spindles, 6+, one would probably consider
> using the second GigE port, which is standard on most of the commodity
> gear out there.
>
>
>

Reply via email to