[ 
https://issues.apache.org/jira/browse/HBASE-29800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050472#comment-18050472
 ] 

Hernan Gelaf-Romer edited comment on HBASE-29800 at 1/7/26 9:31 PM:
--------------------------------------------------------------------

The StartCode is actually completely useless at the moment, because it contains 
the oldest TS in the log boundary map, which can be arbitrarily old as the 
boundary map is never pruned. I see the argument for removing it. Is there an 
argument for making the startCode the "oldest WAL file included in the backup", 
and keeping it around as metadata? Though if there's no precedence or desire to 
keep it around I'm okay removing it. 

 

If we do remove it I'm on board with your strategy for fixing this bug


was (Author: JIRAUSER290343):
The StartCode is actually completely useless at the moment, because it contains 
the oldest TS in the log boundary map, which can be arbitrarily old as the 
boundary map is never pruned. I see the argument for removing it. Is there an 
argument for making the startCode the "oldest WAL file included in the backup", 
and keeping it around as metadata? Though if there's no precedence or desire to 
keep it around I'm okay removing it. 

> WAL logs are unprotected during first full backup
> -------------------------------------------------
>
>                 Key: HBASE-29800
>                 URL: https://issues.apache.org/jira/browse/HBASE-29800
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore
>            Reporter: Dieter De Paepe
>            Priority: Major
>
> There is a small window during the creation of the first full backup in the 
> first/only backup root where WAL logs might be eligible for deletion, which 
> could lead to data loss for incremental backups in the following backups.
> Pseudo code for this scenario is as follows (see 
> FullTableBackupClient#execute):
> {code:java}
> // This is our first backup. Let's put some marker to system table so that we 
> can hold the
> // logs while we do the backup.
> backupManager.writeBackupStartCode(0L);
> // Roll the WALs
> BackupUtils.logRoll(...);
> snapshotAndCopyTables();
> backupManager.writeBackupStartCode(newStartCode);
> // Register the backupInfo as completed
> completeBackup(...);{code}
> The comment of the "0" backupStartCode suggests that it prevents WAL deletion 
> until the backup is completed, but this is not the case.
> The component responsible for preventing WAL deletion for backups is 
> BackupLogCleaner. While the log cleaner does read & use the backup start 
> codes, it only does so for backups that are already completed:
> {code:java}
> // true means only include completed backups
> List<BackupInfo> backups = sysTable.getBackupHistory(true); {code}
> So the log cleaner will not even be aware of the backup root.
> I believe this means there is a risk of data loss in the following 
> incremental backup when a table, after it has been snapshotted but before the 
> backup is completed, performs a log roll and the log cleaner activates.
> Simplest fix is probably to have the log cleaner also use in-progress 
> backupInfos to calculate the startCode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to