This should be something the operators of your data store worriy about. E.g., say hdfs uses three replicas, one should be on a local rack, the other on a different rack (to protect against power outage) And a third on a remote data center...
If you have only a small cluster, then maybe use ups to guard against power outage and watch out for storms? After all, what are the chances that a meteorite hit your data center? -----Original Message----- From: Charles Mason [mailto:[EMAIL PROTECTED] Sent: Monday, September 22, 2008 12:13 PM To: [email protected] Subject: [LIKELY JUNK]Back Up Strategies Hi All, I was wondering what the options there are for backup and dumping an HBase database. I appreciate that having it run on top of a HDFS cluster can protect against individual node failure. However that still doesn't protect against the massive but thankfully rare disasters which take out whole server racks, fire, floods, etc... As far as I can tell there are two options: 1, Scan each table and dump the entire row to some external location, like MySQL Dump does for MySQL. Then to recover simply put the new data back. I am sure the performance of this is going to be fairly bad. 2, Image the data stored on the HDFS cluster. Aren't there some big issues with it not grabbing a consistent image as some updates won't be flushed? Is there any way to force that, or to make it be consistent some way, perhaps via snapshoting? Have I missed anything? Anyone got any suggestions? Charlie M
