DieterDP-ng commented on code in PR #7582:
URL: https://github.com/apache/hbase/pull/7582#discussion_r2834656669
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -228,15 +263,6 @@ private List<String> getLogFilesForNewBackup(Map<String,
Long> olderTimestamps,
} else if (currentLogTS > oldTimeStamp) {
resultLogFiles.add(currentLogFile);
}
-
- // It is possible that a host in .oldlogs is an obsolete region server
Review Comment:
Imagine the following scenario, where the underlying storage of HBASE is slow
- An incremental backup is running
- `IncrementalBackupManager#getIncrBackupLogFileMap` executes the log roll
at timestamp `T`: `BackupUtils.logRoll(...)`. At this point the logs that are
written to storage represent a (not perfect) snapshot of the state of the
different tables.
- `IncrementalBackupManager#getIncrBackupLogFileMap` executes the log
retrieval: `getLogFilesForNewBackup(...)`
- Now, because we assume the storage is slow, it can take multiple minutes
before all files have been added to `resultLogFiles`. Let's say this completes
at `T+10`.
- If, during this time, additional data was ingested in HBASE that caused
more wals to be written, it could be that some of these are picked up by
`getLogFilesForNewBackup`, while others aren't. Since we're assuming the check
has been removed, there's nothing that excludes the extra wals from being
included in the backup. But this could mean that for some region servers we
have logs up to timestamp `T`, for others `T+5` and for some `T+10`.
So, removing the check causes more variability as to the contents of the
data included in the backup.
But to reiterate: I don't think this is a likely realistic case, but it
showcases the conceptual idea about the mechanism that decides which logs to
include.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]