Hello, HBase folks For your consideration today is Backup/Restore feature for Apache HBAse 2.0. Backup code is available as a mega patch in HBASE-14123 (v61), applies cleanly to the current master, all test PASS, patch has no other issues.
The patch has gone through numerous rounds of code reviews and has probably the most lengthy discussion thread on Apache JIRA (HBASE-14123) :) The work has been split into 3 phases (HBASE-14030, 14123, 14414) Two first are complete, third one is still in progress. *** Summary of work HBASE-14123 The new feature introduces new command-line extensions to the hbase command and, from the client side, is accessible through command-line only Operations: * Create full backup on a list of tables or backup set * Create incremental backup image for table list or backup set * Restore list of tables from a given backup image * Show current backup progress * Delete backup image and all related images * Show history of backups * Backup set operations: create backup set, add/remove table to/from backup set, etc In the current implementation, the feature is already usable, meaning that users can backup tables and restore them using provided command-line tools. Both: full and incremental backups are supported. This work is based on original work of IBM team (HBASE-7912). The full list of JIRAs included in this mega patch can be found in three umbrella JIRAs: HBASE-14030 (Phase 1), HBASE-14123 (Phase 2) and HBASE-14414 (Phase 3 - all resolved ones made it into the patch) *** What are the remaining work items All remaining items can be found in Phase 3 umbrella JIRA: HBASE-14414. They are split into 3 groups: BLOCKER, CRITICAL, MAJOR Only BLOCKERs and CRITICALs are guaranteed for HBase 2.0 release. ***** BLOCKER * HBASE-14417 Incremental backup and bulk loading ( Patch available) * HBASE-14135 HBase Backup/Restore Phase 3: Merge backup images * HBASE-14141 HBase Backup/Restore Phase 3: Filter WALs on backup to include only edits from backup tables (Patch available) * HBASE-17133 Backup documentation * HBASE-15227 Fault tolerance support ***** CRITICAL * HBASE-16465 Disable split/merges during backup We have umbrella JIRA (HBASE-14414) to track all the remaining work All the BLOCKER and CRITICAL JIRAs currently in open state will be implemented by 2.0 release time. Some MAJOR too, but it depends on resource availability The former development branch (HBASE-7912) is obsolete and will be closed/deleted after the merge. We want backup to be a GA feature in 2.0 We are going to support full backward compatibility for backup tool in 2.0 and onwards. **** Configuration Backup is disabled, by default. To enable it, the following configuration properties must be added to hbase-site.xml: hbase.backup.enable=true hbase.master.logcleaner.plugins=YOUR_PLUGINS,org.apache.hadoop.hbase.backup.master.BackupLogCleaner hbase.procedure.master.classes=YOUR_CLASSES,org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager hbase.procedure.regionserver.classes=YOUR_CLASSES,org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager I would like to thank IBM team and Jerry He for original work, Enis, Ted, Stack, Matteo, Jerry for time spent on code reviews Special thanks to Ted Yu for his co-development work. References: https://issues.apache.org/jira/browse/HBASE-7912 (original IBM, contains design doc) https://issues.apache.org/jira/browse/HBASE-14030 (Phase 1) https://issues.apache.org/jira/browse/HBASE-14123 (Phase 2) https://issues.apache.org/jira/browse/HBASE-14414 (Phase 3) Please vote +1/-1 by midnight Pacific Time (00:00 -0800 GMT) on March 11th on whether or not we should merge this into the current master. -Vladimir Rodionov