jason hadoop wrote:

Raghu, your technique will only work well if you can complete steps 1-4 in
less than the datanode timeout interval, which may be valid for Alexandria.
I believe the timeout is 10 minutes.

Correct I should have mentioned it as well. the process should finish in 10 minutes. Alternately user could stop NameNode or the entire cluster during this time or swap steps (2) and (3)

Raghu.

If you pass the timeout interval the namenode will start to rebalance the
blocks, and when the datanode comes back it will delete all of the blocks it
has rebalanced.

On Thu, May 14, 2009 at 11:35 AM, Raghu Angadi <rang...@yahoo-inc.com>wrote:

Along these lines, even simpler approach I would think is :

1) set data.dir to local and create the data.
2) stop the datanode
3) rsync local_dir network_dir
4) start datanode with data.dir with network_dir

There is no need to format or rebalnace.

This way you can switch between local and network multiple times (without
needing to rsync data, if there are no changes made in the tests)

Raghu.


Alexandra Alecu wrote:

Another possibility I am thinking about now, which is suitable for me as I
do
not actually have much data stored in the cluster when I want to perform
this switch is to set the replication level really high and then simply
remove the local storage locations and restart the cluster. With a bit of
luck the high level of replication will allow a full recovery of the
cluster
on restart.

Is this something that you would advice?

Many thanks,
Alexandra.





Reply via email to