Ray Mattingly created HBASE-28697:
-------------------------------------

             Summary: Incremental backups delete bulk loaded system table rows 
too early
                 Key: HBASE-28697
                 URL: https://issues.apache.org/jira/browse/HBASE-28697
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Ray Mattingly


I've been thinking through the incremental backup order of operations, and I 
think we delete rows from the bulk loads system table too early and, 
consequently, make it possible to produce a "successful" incremental backup 
that is missing bulk loads.

To summarize the steps here, starting in 
{{{}IncrementalTableBackupCilent#execute{}}}:
 # We take an incremental backup of the WALs generated since the last backup
 # We ensure any bulk loads done since the last backup are appropriately 
represented in the new backup by going through the system table and copying the 
appropriate files to the backup directory
 # We delete all of the system table rows which told us about these bulk loads
 # We generate a backup manifest and mark the backup as complete

If we began deleting any of the system table rows regarding bulk loads, but 
fail in steps 3 and 4 before we are able to mark the backup as complete, then 
we'll be in a precarious spot. If we retry an incremental backup then it may 
succeed, but it would not know to persist the bulk loaded files for which we 
have already deleted system table references.

We could consider this issue an extension or replacement of 
https://issues.apache.org/jira/browse/HBASE-28084 in some ways, depending on 
what solution we land on. I think that we could fix this specific issue by 
reordering the bulk load table cleanup, but there will always be gotchas like 
this. Maybe it is simpler to require that the next backup be a full backup 
after any incremental failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to