You can have separate configuration files for the different datanodes. If you are willing to deal with the complexity you can manually start them with altered properties from the command line.
rsync or other means of sharing identical configs is simple and common. Raghu, your technique will only work well if you can complete steps 1-4 in less than the datanode timeout interval, which may be valid for Alexandria. I believe the timeout is 10 minutes. If you pass the timeout interval the namenode will start to rebalance the blocks, and when the datanode comes back it will delete all of the blocks it has rebalanced. On Thu, May 14, 2009 at 11:35 AM, Raghu Angadi <rang...@yahoo-inc.com>wrote: > > Along these lines, even simpler approach I would think is : > > 1) set data.dir to local and create the data. > 2) stop the datanode > 3) rsync local_dir network_dir > 4) start datanode with data.dir with network_dir > > There is no need to format or rebalnace. > > This way you can switch between local and network multiple times (without > needing to rsync data, if there are no changes made in the tests) > > Raghu. > > > Alexandra Alecu wrote: > >> Another possibility I am thinking about now, which is suitable for me as I >> do >> not actually have much data stored in the cluster when I want to perform >> this switch is to set the replication level really high and then simply >> remove the local storage locations and restart the cluster. With a bit of >> luck the high level of replication will allow a full recovery of the >> cluster >> on restart. >> >> Is this something that you would advice? >> >> Many thanks, >> Alexandra. >> > > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals