Dieter De Paepe created HBASE-29800:
---------------------------------------
Summary: WAL logs are unprotected during first full backup
Key: HBASE-29800
URL: https://issues.apache.org/jira/browse/HBASE-29800
Project: HBase
Issue Type: Bug
Components: backup&restore
Reporter: Dieter De Paepe
There is a small window during the creation of the first full backup in the
first/only backup root where WAL logs might be eligible for deletion, which
could lead to data loss for incremental backups in the following backups.
Pseudo code for this scenario is as follows (see FullTableBackupClient#execute):
{code:java}
// This is our first backup. Let's put some marker to system table so that we
can hold the
// logs while we do the backup.
backupManager.writeBackupStartCode(0L);
// Roll the WALs
BackupUtils.logRoll(...);
snapshotAndCopyTables();
backupManager.writeBackupStartCode(newStartCode);
// Register the backupInfo as completed
completeBackup(...);{code}
The comment of the "0" backupStartCode suggests that it prevents WAL deletion
until the backup is completed, but this is not the case.
The component responsible for preventing WAL deletion for backups is
BackupLogCleaner. While the log cleaner does read & use the backup start codes,
it only does so for backups that are already completed:
{code:java}
// true means only include completed backups
List<BackupInfo> backups = sysTable.getBackupHistory(true); {code}
So the log cleaner will not even be aware of the backup root.
I believe this means there is a risk of data loss in the following incremental
backup when a table, after it has been snapshotted but before the backup is
completed, performs a log roll and the log cleaner activates.
Simplest fix is probably to have the log cleaner also use in-progress
backupInfos to calculate the startCode.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)