On Mon, Sep 22, 2008 at 12:13 PM, Charles Mason <[EMAIL PROTECTED]> wrote: > Hi All, > > I was wondering what the options there are for backup and dumping an > HBase database. I appreciate that having it run on top of a HDFS > cluster can protect against individual node failure. However that > still doesn't protect against the massive but thankfully rare > disasters which take out whole server racks, fire, floods, etc...
There will be something released for this this week :) > > As far as I can tell there are two options: > > 1, Scan each table and dump the entire row to some external location, > like MySQL Dump does for MySQL. Then to recover simply put the new > data back. I am sure the performance of this is going to be fairly > bad. It's not as bad as you may think, though we have not tested it on very large clusters. Depending on your configuration, the importing of a backup is usually the most costly operation as regions split, etc. > > 2, Image the data stored on the HDFS cluster. Aren't there some big > issues with it not grabbing a consistent image as some updates won't > be flushed? Is there any way to force that, or to make it be > consistent some way, perhaps via snapshoting? That's correct, and we were not able to come up with a good way to snapshot HBase. It either took much much longer than dumping the data out of a table, or gave us inconsistent data. Maybe this will be easier in a future HBase release, but for now its probably not something you'd want to do with production data. > > Have I missed anything? Anyone got any suggestions? > > Charlie M > -- Dan Zinngrabe Alchemist -- Mahalo.com http://www.mahalo.com/member/quellish [EMAIL PROTECTED]
