On Mon, Sep 22, 2008 at 8:28 PM, Dan Zinngrabe <[EMAIL PROTECTED]> wrote: > On Mon, Sep 22, 2008 at 12:13 PM, Charles Mason <[EMAIL PROTECTED]> wrote: >> Hi All, >> >> I was wondering what the options there are for backup and dumping an >> HBase database. I appreciate that having it run on top of a HDFS >> cluster can protect against individual node failure. However that >> still doesn't protect against the massive but thankfully rare >> disasters which take out whole server racks, fire, floods, etc... > > There will be something released for this this week :)
I look forward to that then :) >> >> As far as I can tell there are two options: >> >> 1, Scan each table and dump the entire row to some external location, >> like MySQL Dump does for MySQL. Then to recover simply put the new >> data back. I am sure the performance of this is going to be fairly >> bad. > > It's not as bad as you may think, though we have not tested it on very > large clusters. Depending on your configuration, the importing of a > backup is usually the most costly operation as regions split, etc. I suppose that's not such a problem, hopefully people wouldn't have to restore their clusters that often. >> >> 2, Image the data stored on the HDFS cluster. Aren't there some big >> issues with it not grabbing a consistent image as some updates won't >> be flushed? Is there any way to force that, or to make it be >> consistent some way, perhaps via snapshoting? > > That's correct, and we were not able to come up with a good way to > snapshot HBase. It either took much much longer than dumping the data > out of a table, or gave us inconsistent data. Maybe this will be > easier in a future HBase release, but for now its probably not > something you'd want to do with production data. I can imagine it could be quite a complex problem to solve. Charlie M
