On Mon, Sep 22, 2008 at 8:28 PM, Dan Zinngrabe <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 22, 2008 at 12:13 PM, Charles Mason <[EMAIL PROTECTED]> wrote:
>> Hi All,
>>
>> I was wondering what the options there are for backup and dumping an
>> HBase database. I appreciate that having it run on top of a HDFS
>> cluster can protect against individual node failure. However that
>> still doesn't protect against the massive but thankfully rare
>> disasters which take out whole server racks, fire, floods, etc...
>
> There will be something released for this this week :)

I look forward to that then :)

>>
>> As far as I can tell there are two options:
>>
>> 1, Scan each table and dump the entire row to some external location,
>> like MySQL Dump does for MySQL. Then to recover simply put the new
>> data back. I am sure the performance of this is going to be fairly
>> bad.
>
> It's not as bad as you may think, though we have not tested it on very
> large clusters. Depending on your configuration, the importing of a
> backup is usually the most costly operation as regions split, etc.

I suppose that's not such a problem, hopefully people wouldn't have to
restore their clusters that often.

>>
>> 2, Image the data stored on the HDFS cluster. Aren't there some big
>> issues with it not grabbing a consistent image as some updates won't
>> be flushed? Is there any way to force that, or to make it be
>> consistent some way, perhaps via snapshoting?
>
> That's correct, and we were not able to come up with a good way to
> snapshot HBase. It either took much much longer than dumping the data
> out of a table, or gave us inconsistent data. Maybe this will be
> easier in a future HBase release, but for now its probably not
> something you'd want to do with production data.

I can imagine it could be quite a complex problem to solve.


Charlie M

Reply via email to