This is kind of use-case what I was looking for ( persistent HDFS
across ec2 cluster restarts )

Correct me if I am wrong,  I probably don't even need to take
snapshots if I am bringing down and restarting  the entire ec2
cluster.  I am using cloudera's hadoop-ec2's launch/terminate clusters
to start/shutdown my hadoop clusters running on ec2.

Now is there anything additional I need to do ( while bringing up the
cluster )  to use the previously stored files in  hdfs ( aka ebs
volumes ).  We probably need to have the EXACT same number of
nodes/slaves ( or not required ) ?  Anything additional to that ? Or
any slave can be attached to any ebs volume ( without maintaining any
history ) ?

Didnt see any wiki/article on this particular use-case ( except some
mail threads in hbase). I think this requirement is probably specific
to hdfs ( and hence affects hbase and hive )

-Prasen


On Thu, Mar 4, 2010 at 7:26 AM, Vaibhav Puranik <vpura...@gmail.com> wrote:
> Kevin,
>
> Are you using EBS? If yes, just take a snapshot of your volumes. And create
> new volumes from the snapshot.
>
> Regards,
> Vaibhav Puranik
> GumGum
>
> On Wed, Mar 3, 2010 at 1:12 PM, Jonathan Gray <jl...@streamy.com> wrote:
>
>> Kevin,
>>
>> Taking writes during the transition time will be the issue.
>>
>> If you don't take any writes, then you can flush all your tables and do an
>> HDFS copy the same way.  HBase doesn't actually have to be shutdown, that's
>> just recommended to prevent things from changing mid-backup.  If you're
>> careful to not write data it should be ok.
>>
>> JG
>>
>> -----Original Message-----
>> From: Ted Yu [mailto:yuzhih...@gmail.com]
>> Sent: Wednesday, March 03, 2010 11:40 AM
>> To: hbase-u...@hadoop.apache.org
>> Subject: Re: HFile backup while cluster running
>>
>> If you disable writing, you can use
>> org.apache.hadoop.hbase.mapreduce.Export
>> to export all your data, copy them to your new HDFS, then use
>> org.apache.hadoop.hbase.mapreduce.Import, finally switch your clients to
>> the
>> new HBase cluster.
>>
>> On Wed, Mar 3, 2010 at 11:27 AM, Kevin Peterson <kpeter...@biz360.com
>> >wrote:
>>
>> > My current setup in EC2 is a Hadoop Map Reduce cluster and HBase
>> > cluster sharing the same HDFS. That is, I have a batch of nodes that
>> > run datanode and tasktracker and a bunch of nodes that run datanode
>> > and regionserver. I'm trying to move HBase off this cluster to a new
>> > cluster with it's own HDFS.
>> >
>> > My plan is to shut down the cluster, copy the HFiles using distcp, and
>> > then start up the new cluster. My problem is that it looks like it
>> > will take several hours to transfer the > 1TB of data. I don't want to
>> > be offline that long. Is it possible to copy the HFiles while the
>> > cluster is up? Do I need to take any special precautions? I think my
>> > plan would be to turn off any jobs writing, take what tables I can
>> > offline, and leave the critical tables online but only serving reads.
>> >
>> > Jonathan Gray mentioned he has copied the files with HBase running
>> > successfully in https://issues.apache.org/jira/browse/HBASE-1684
>> >
>>
>>
>

Reply via email to