Bryan Beaudreault created HBASE-28423: -----------------------------------------
Summary: Improvements to backup of bulkloaded files Key: HBASE-28423 URL: https://issues.apache.org/jira/browse/HBASE-28423 Project: HBase Issue Type: Improvement Reporter: Bryan Beaudreault Backup/Restore has support for including bulkloaded files in incremental backups. There is a coprocessor hook which registers all bulkloads into a backup:system_bulk table. A cleaner plugin ensures that these files are not cleaned up from the archive until they are backed up. When the incremental backup occurs, the files are deleted from the system_bulk table and then cleaned up. We have encountered two problems to be solved with this: # The deletion process only happens during incremental backups, not full backups. A full backup already includes all data in the table via a snapshot export. So we should clear any pending bulkloads upon full backup. # There is currently no linking of bulkload state to backupRoot. It's possible to have multiple backupRoots for tables. For example, you might backup to 2 destinations with different schedules. Currently whichever backupRoot does an incremental backup first will be the one to include bulkloads, then the system_bulk table. We need some sort of mapping of bulkload to backupRoot, and we should only delete the rows from system_bulk once the files have been included in all active backupRoots. -- This message was sent by Atlassian Jira (v8.20.10#820010)