Hello, HBase folks

For your consideration today is Backup/Restore feature for Apache HBAse 2.0.
Backup code is available as a mega patch in HBASE-14123 (v61), applies
cleanly to the current master, all test PASS, patch has no other issues.

The patch has gone through numerous rounds of code reviews and has probably
the most lengthy discussion thread on Apache JIRA (HBASE-14123) :)

The work has been split into 3 phases (HBASE-14030, 14123, 14414) Two first
are complete, third one is still in progress.


*** Summary of work HBASE-14123

The new feature introduces new command-line extensions to the hbase command
and, from the client side, is accessible through command-line only
Operations:
* Create full backup on a list of tables or backup set
* Create incremental backup image for table list or backup set
* Restore list of tables from a given backup image
* Show current backup progress
* Delete backup image and all related images
* Show history of backups
* Backup set operations: create backup set, add/remove table to/from backup
set, etc

In the current implementation, the feature is already usable, meaning that
users can backup tables and restore them using provided command-line tools.
Both: full and incremental backups are supported.
This work is based on original work of IBM team (HBASE-7912). The full list
of JIRAs included in this mega patch can be found in three umbrella JIRAs:
HBASE-14030 (Phase 1), HBASE-14123 (Phase 2) and HBASE-14414 (Phase 3 - all
resolved ones made it into the patch)

*** What are the remaining work items

All remaining items can be found in Phase 3 umbrella JIRA: HBASE-14414.
They are split into 3 groups: BLOCKER, CRITICAL, MAJOR
Only BLOCKERs and CRITICALs are guaranteed for HBase 2.0 release.

***** BLOCKER

* HBASE-14417 Incremental backup and bulk loading ( Patch available)
* HBASE-14135 HBase Backup/Restore Phase 3: Merge backup images
* HBASE-14141 HBase Backup/Restore Phase 3: Filter WALs on backup to
include only edits from backup tables (Patch available)
* HBASE-17133 Backup documentation
* HBASE-15227 Fault tolerance support

***** CRITICAL

* HBASE-16465 Disable split/merges during backup

We have umbrella JIRA (HBASE-14414) to track all the remaining work
All the BLOCKER and CRITICAL JIRAs currently in open state will be
implemented by 2.0 release time. Some MAJOR too, but it depends on resource
availability
The former development branch (HBASE-7912) is obsolete and will be
closed/deleted after the merge.
We want backup to be a GA feature in 2.0
We are going to support full backward compatibility for backup tool in 2.0
and onwards.

**** Configuration

Backup is disabled, by default. To enable it, the following configuration
properties must be added to hbase-site.xml:

hbase.backup.enable=true
hbase.master.logcleaner.plugins=YOUR_PLUGINS,org.apache.hadoop.hbase.backup.master.BackupLogCleaner
hbase.procedure.master.classes=YOUR_CLASSES,org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager
hbase.procedure.regionserver.classes=YOUR_CLASSES,org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager


I would like to thank IBM team and Jerry He for original work,

Enis, Ted, Stack, Matteo, Jerry for time spent on code reviews

Special thanks to Ted Yu for his co-development work.

References:

https://issues.apache.org/jira/browse/HBASE-7912 (original IBM, contains
design doc)
https://issues.apache.org/jira/browse/HBASE-14030 (Phase 1)
https://issues.apache.org/jira/browse/HBASE-14123 (Phase 2)
https://issues.apache.org/jira/browse/HBASE-14414 (Phase 3)

Please  vote +1/-1 by midnight Pacific Time (00:00
-0800 GMT) on March 11th  ​on whether or not we should merge this into the
current master.

-Vladimir Rodionov

Reply via email to