[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017538#comment-14017538 ]
Honghua Feng commented on HBASE-7912: ------------------------------------- Just finished reading the design doc "HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf". It's a good enhancement and extension to current data backup/restore option/solution, and the design doc reads quite concise and clear :-) Some comments: # "Use case example 1" in page 3: The full backup doesn't contain data of table3 and table4, so when restoring table3 and table4, their data are all restored from the incremental backups, right? Sounds it's not a typical scenario(full-backup + incremental backups) for backup/restore. # "4. Full Backup": Does log roll take place after taking (full) snapshot? What if new writes arrive after taking snapshot but before log roll? # "5. Incremental Backup": What if some RS fails during the log roll procedure so that not all current log number are recorded onto ZooKeeper? # What if some log files are archived/deleted between two incremental backups and are not included in any incremental backup? Is it possible? Some (possible) typos in the design doc: # "2. Key features and Use Cases": "Full back uses HBase..." => "Full backup uses HBase..." # "5. Incremental Backup": "kicks of a global..." => "kicks off a global..." # "5. Incremental Backup": "Incremental backups and also be..." => "Incremental backups can also be..." > HBase Backup/Restore Based on HBase Snapshot > -------------------------------------------- > > Key: HBASE-7912 > URL: https://issues.apache.org/jira/browse/HBASE-7912 > Project: HBase > Issue Type: Sub-task > Reporter: Richard Ding > Assignee: Richard Ding > Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, > HBase_BackupRestore-Jira-7912-CLI-v1.pdf > > > Finally, we completed the implementation of our backup/restore solution, and > would like to share with community through this jira. > We are leveraging existing hbase snapshot feature, and provide a general > solution to common users. Our full backup is using snapshot to capture > metadata locally and using exportsnapshot to move data to another cluster; > the incremental backup is using offline-WALplayer to backup HLogs; we also > leverage global distribution rolllog and flush to improve performance; other > added-on values such as convert, merge, progress report, and CLI commands. So > that a common user can backup hbase data without in-depth knowledge of hbase. > Our solution also contains some usability features for enterprise users. > The detail design document and CLI command will be attached in this jira. We > plan to use 10~12 subtasks to share each of the following features, and > document the detail implement in the subtasks: > * *Full Backup* : provide local and remote back/restore for a list of tables > * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental > backup) > * *distributed* Logroll and distributed flush > * Backup *Manifest* and history > * *Incremental* backup: to build on top of full backup as daily/weekly backup > * *Convert* incremental backup WAL files into hfiles > * *Merge* several backup images into one(like merge weekly into monthly) > * *add and remove* table to and from Backup image > * *Cancel* a backup process > * backup progress *status* > * full backup based on *existing snapshot* > *-------------------------------------------------------------------------------------------------------------* > *Below is the original description, to keep here as the history for the > design and discussion back in 2013* > There have been attempts in the past to come up with a viable HBase > backup/restore solution (e.g., HBASE-4618). Recently, there are many > advancements and new features in HBase, for example, FileLink, Snapshot, and > Distributed Barrier Procedure. This is a proposal for a backup/restore > solution that utilizes these new features to achieve better performance and > consistency. > > A common practice of backup and restore in database is to first take full > baseline backup, and then periodically take incremental backup that capture > the changes since the full baseline backup. HBase cluster can store massive > amount data. Combination of full backups with incremental backups has > tremendous benefit for HBase as well. The following is a typical scenario > for full and incremental backup. > # The user takes a full backup of a table or a set of tables in HBase. > # The user schedules periodical incremental backups to capture the changes > from the full backup, or from last incremental backup. > # The user needs to restore table data to a past point of time. > # The full backup is restored to the table(s) or to different table name(s). > Then the incremental backups that are up to the desired point in time are > applied on top of the full backup. > We would support the following key features and capabilities. > * Full backup uses HBase snapshot to capture HFiles. > * Use HBase WALs to capture incremental changes, but we use bulk load of > HFiles for fast incremental restore. > * Support single table or a set of tables, and column family level backup and > restore. > * Restore to different table names. > * Support adding additional tables or CF to backup set without interruption > of incremental backup schedule. > * Support rollup/combining of incremental backups into longer period and > bigger incremental backups. > * Unified command line interface for all the above. > The solution will support HBase backup to FileSystem, either on the same > cluster or across clusters. It has the flexibility to support backup to > other devices and servers in the future. -- This message was sent by Atlassian JIRA (v6.2#6252)