Bryan Beaudreault created HBASE-28423:
-----------------------------------------

             Summary: Improvements to backup of bulkloaded files
                 Key: HBASE-28423
                 URL: https://issues.apache.org/jira/browse/HBASE-28423
             Project: HBase
          Issue Type: Improvement
            Reporter: Bryan Beaudreault


Backup/Restore has support for including bulkloaded files in incremental 
backups. There is a coprocessor hook which registers all bulkloads into a 
backup:system_bulk table. A cleaner plugin ensures that these files are not 
cleaned up from the archive until they are backed up. When the incremental 
backup occurs, the files are deleted from the system_bulk table and then 
cleaned up.

We have encountered two problems to be solved with this:
 # The deletion process only happens during incremental backups, not full 
backups. A full backup already includes all data in the table via a snapshot 
export. So we should clear any pending bulkloads upon full backup.
 # There is currently no linking of bulkload state to backupRoot. It's possible 
to have multiple backupRoots for tables. For example, you might backup to 2 
destinations with different schedules. Currently whichever backupRoot does an 
incremental backup first will be the one to include bulkloads, then the 
system_bulk table. We need some sort of mapping of bulkload to backupRoot, and 
we should only delete the rows from system_bulk once the files have been 
included in all active backupRoots.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to