This is kind of use-case what I was looking for ( persistent HDFS across ec2 cluster restarts )
Correct me if I am wrong, I probably don't even need to take snapshots if I am bringing down and restarting the entire ec2 cluster. I am using cloudera's hadoop-ec2's launch/terminate clusters to start/shutdown my hadoop clusters running on ec2. Now is there anything additional I need to do ( while bringing up the cluster ) to use the previously stored files in hdfs ( aka ebs volumes ). We probably need to have the EXACT same number of nodes/slaves ( or not required ) ? Anything additional to that ? Or any slave can be attached to any ebs volume ( without maintaining any history ) ? Didnt see any wiki/article on this particular use-case ( except some mail threads in hbase). I think this requirement is probably specific to hdfs ( and hence affects hbase and hive ) -Prasen On Thu, Mar 4, 2010 at 7:26 AM, Vaibhav Puranik <vpura...@gmail.com> wrote: > Kevin, > > Are you using EBS? If yes, just take a snapshot of your volumes. And create > new volumes from the snapshot. > > Regards, > Vaibhav Puranik > GumGum > > On Wed, Mar 3, 2010 at 1:12 PM, Jonathan Gray <jl...@streamy.com> wrote: > >> Kevin, >> >> Taking writes during the transition time will be the issue. >> >> If you don't take any writes, then you can flush all your tables and do an >> HDFS copy the same way. HBase doesn't actually have to be shutdown, that's >> just recommended to prevent things from changing mid-backup. If you're >> careful to not write data it should be ok. >> >> JG >> >> -----Original Message----- >> From: Ted Yu [mailto:yuzhih...@gmail.com] >> Sent: Wednesday, March 03, 2010 11:40 AM >> To: hbase-u...@hadoop.apache.org >> Subject: Re: HFile backup while cluster running >> >> If you disable writing, you can use >> org.apache.hadoop.hbase.mapreduce.Export >> to export all your data, copy them to your new HDFS, then use >> org.apache.hadoop.hbase.mapreduce.Import, finally switch your clients to >> the >> new HBase cluster. >> >> On Wed, Mar 3, 2010 at 11:27 AM, Kevin Peterson <kpeter...@biz360.com >> >wrote: >> >> > My current setup in EC2 is a Hadoop Map Reduce cluster and HBase >> > cluster sharing the same HDFS. That is, I have a batch of nodes that >> > run datanode and tasktracker and a bunch of nodes that run datanode >> > and regionserver. I'm trying to move HBase off this cluster to a new >> > cluster with it's own HDFS. >> > >> > My plan is to shut down the cluster, copy the HFiles using distcp, and >> > then start up the new cluster. My problem is that it looks like it >> > will take several hours to transfer the > 1TB of data. I don't want to >> > be offline that long. Is it possible to copy the HFiles while the >> > cluster is up? Do I need to take any special precautions? I think my >> > plan would be to turn off any jobs writing, take what tables I can >> > offline, and leave the critical tables online but only serving reads. >> > >> > Jonathan Gray mentioned he has copied the files with HBase running >> > successfully in https://issues.apache.org/jira/browse/HBASE-1684 >> > >> >> >