[PR] HBASE-29296 Missing critical snapshot expiration checks [hbase]

via GitHub Thu, 08 May 2025 07:53:11 -0700


dParikesit opened a new pull request, #6970:
URL: https://github.com/apache/hbase/pull/6970


   Jira: [HBASE-29296](https://issues.apache.org/jira/browse/HBASE-29296)
   
   In HBase it is crucial to prevent expired snapshots returned to clients to 
ensure correctness. There have been existing efforts  (e.g., 
[HBASE-27671](https://issues.apache.org/jira/browse/HBASE-27671) and  
[HBASE-28704](https://issues.apache.org/jira/browse/HBASE-28704)) adding 
snapshot expiration checks in different scenarios to avoid such issues. 
However, we found such protection is not consistent. Specifically, several 
operations still miss such checks in the latest hbase version (5dafa9e). Their 
patterns are similar to the previous tickets mentioned above. In practice, we 
observed expired snapshots still returning to clients successfully without 
generating any alarms.
   
   We have written test cases to prove these issues can be reproduced 
successfully (see attached). We also attach the manual steps in case anyone is 
interested. 
   
   Your insights are very much appreciated. We will continue following up this 
issue until it is resolved.
   
   Reproducing steps (3 scenarios in total)
   
   1. Restore
   Doing a restore on full backup will succeed even if the snapshot has 
expired. This expiration can happen if during the backup, 
`hbase.master.snapshot.ttl` was set.
   
   Steps to reproduce this bug:
   A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value
   B. Create a table
   C. Create a full backup using `hbase backup create full 
hdfs://host5:9000/data/backup -t tableName`
   D. Wait until the snapshot has expired
   E. Restore the table using `hbase restore hdfs://host5:9000/data/backup 
<backup_id>`
   F. Check that the table is restored successfully
   
   We propose to add a snapshot expiration check on 
RestoreTool.java:createAndRestoreTable to prevent this issue.
   
    
   
   2. Incremental backup
   Incremental backup is done based on a previous full backup. Incremental 
backup will succeed even if the full backup has expired.
   
   Steps to reproduce this bug:
   A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value
   B. Create a table
   C. Create a full backup using `hbase backup create full 
hdfs://host5:9000/data/backup -t tableName`
   D. Wait until the snapshot has expired
   E. Create an incremental backup using `hbase backup create incremental 
hdfs://host5:9000/data/backup -t tableName`
   F. Check that the backup succeed
   
   We propose to add a snapshot expiration check on 
IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue.
   
    
   
   3. Snapshot procedure
   
   We found that it is possible to create a snapshot with a TTL value so low 
that it will expire before the SnapshotProcedure has finished. The 
SnapshotProcedure will finish normally as if the snapshot is fine.
   
   Steps to reproduce this bug:
   A. Start an HBase cluster and create a table
   B. Create a snapshot using hbase shell with TTL=1 
   `snapshot 'mytable', 'snapshot1234', {TTL => 1}`
   C. Check that the command finished without an error, and the snapshot has 
expired
   
   This behavior is only possible if the user accidentally sets the TTL to be 
too low or if the SnapshotProcedure is interrupted after the 
`SNAPSHOT_WRITE_SNAPSHOT_INFO` but before it’s fully finished.
   
   We propose to add an expiration check in the `SNAPSHOT_COMPLETE_SNAPSHOT` 
phase right before the snapshot is marked as completed to ensure that the 
snapshot hasn’t expired before the SnapshotProcedure is considered successfully 
finished.
   
   Granted, we’re not sure whether this one is a bug or an intended behavior


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] HBASE-29296 Missing critical snapshot expiration checks [hbase]

Reply via email to