Re: How to replace the storage on a datanode without formatting the namenode?

jason hadoop Thu, 14 May 2009 21:23:11 -0700

You can have separate configuration files for the different datanodes.

If you are willing to deal with the complexity you can manually start them
with altered properties from the command line.

rsync or other means of sharing  identical configs is simple and common.

Raghu, your technique will only work well if you can complete steps 1-4 in
less than the datanode timeout interval, which may be valid for Alexandria.
I believe the timeout is 10 minutes.
If you pass the timeout interval the namenode will start to rebalance the
blocks, and when the datanode comes back it will delete all of the blocks it
has rebalanced.

On Thu, May 14, 2009 at 11:35 AM, Raghu Angadi <rang...@yahoo-inc.com>wrote:

>
> Along these lines, even simpler approach I would think is :
>
> 1) set data.dir to local and create the data.
> 2) stop the datanode
> 3) rsync local_dir network_dir
> 4) start datanode with data.dir with network_dir
>
> There is no need to format or rebalnace.
>
> This way you can switch between local and network multiple times (without
> needing to rsync data, if there are no changes made in the tests)
>
> Raghu.
>
>
> Alexandra Alecu wrote:
>
>> Another possibility I am thinking about now, which is suitable for me as I
>> do
>> not actually have much data stored in the cluster when I want to perform
>> this switch is to set the replication level really high and then simply
>> remove the local storage locations and restart the cluster. With a bit of
>> luck the high level of replication will allow a full recovery of the
>> cluster
>> on restart.
>>
>> Is this something that you would advice?
>>
>> Many thanks,
>> Alexandra.
>>
>
>

-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals

Re: How to replace the storage on a datanode without formatting the namenode?

Reply via email to