DieterDP-ng commented on code in PR #7582:
URL: https://github.com/apache/hbase/pull/7582#discussion_r2834656669


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -228,15 +263,6 @@ private List<String> getLogFilesForNewBackup(Map<String, 
Long> olderTimestamps,
       } else if (currentLogTS > oldTimeStamp) {
         resultLogFiles.add(currentLogFile);
       }
-
-      // It is possible that a host in .oldlogs is an obsolete region server

Review Comment:
   Imagine the following scenario, where the underlying storage of HBASE is slow
   
   - An incremental backup is running
   - `IncrementalBackupManager#getIncrBackupLogFileMap` executes the log roll 
at timestamp `T`: `BackupUtils.logRoll(...)`. At this point the logs that are 
written to storage represent a (not perfect) snapshot of the state of the 
different tables.
   - `IncrementalBackupManager#getIncrBackupLogFileMap` executes the log 
retrieval: `getLogFilesForNewBackup(...)`
   - Now, because we assume the storage is slow, it can take multiple minutes 
before all files have been added to `resultLogFiles`. Let's say this completes 
at `T+10`.
   - If, during this time, additional data was ingested in HBASE that caused 
more wals to be written, it could be that some of these are picked up by 
`getLogFilesForNewBackup`, while others aren't. Since we're assuming the check 
has been removed, there's nothing that excludes the extra wals from being 
included in the backup. But this could mean that for some region servers we 
have logs up to timestamp `T`, for others `T+5` and for some `T+10`.
   
   So, removing the check causes more variability as to the contents of the 
data included in the backup.
   
   But to reiterate: I don't think this is a likely realistic case, but it 
showcases the conceptual idea about the mechanism that decides which logs to 
include.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to