[ 
https://issues.apache.org/jira/browse/HBASE-29296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-29296:
-----------------------------------
    Labels: pull-request-available  (was: )

> Missing critical snapshot expiration checks
> -------------------------------------------
>
>                 Key: HBASE-29296
>                 URL: https://issues.apache.org/jira/browse/HBASE-29296
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore, snapshots
>    Affects Versions: 2.6.2
>            Reporter: Dimas Shidqi Parikesit
>            Priority: Critical
>              Labels: pull-request-available
>
> In HBase it is crucial to prevent expired snapshots returned to clients to 
> ensure correctness. There have been existing efforts  (e.g., HBASE-27671 and  
> HBASE-28704) adding snapshot expiration checks in different scenarios to 
> avoid such issues. However, we found such protection is not consistent. 
> Specifically, several operations still miss such checks in the latest hbase 
> version (5dafa9e). Their patterns are similar to the previous tickets 
> mentioned above. In practice, we observed expired snapshots still returning 
> to clients successfully without generating any alarms.
> We have written test cases to prove these issues can be reproduced 
> successfully (see attached). We also attach the manual steps in case anyone 
> is interested. 
> Your insights are very much appreciated. We will continue following up this 
> issue until it is resolved.
> Reproducing steps (3 scenarios in total)
> 1. Restore
> Doing a restore on full backup will succeed even if the snapshot has expired. 
> This expiration can happen if during the backup, `hbase.master.snapshot.ttl` 
> was set.
> Steps to reproduce this bug:
> A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value
> B. Create a table
> C. Create a full backup using `hbase backup create full 
> hdfs://host5:9000/data/backup -t tableName`
> D. Wait until the snapshot has expired
> E. Restore the table using `hbase restore hdfs://host5:9000/data/backup 
> <backup_id>`
> F. Check that the table is restored successfully
> We propose to add a snapshot expiration check on 
> RestoreTool.java:createAndRestoreTable to prevent this issue.
>  
> 2. Incremental backup
> Incremental backup is done based on a previous full backup. Incremental 
> backup will succeed even if the full backup has expired.
> Steps to reproduce this bug:
> A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value
> B. Create a table
> C. Create a full backup using `hbase backup create full 
> hdfs://host5:9000/data/backup -t tableName`
> D. Wait until the snapshot has expired
> E. Create an incremental backup using `hbase backup create incremental 
> hdfs://host5:9000/data/backup -t tableName`
> F. Check that the backup succeed
> We propose to add a snapshot expiration check on 
> IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue.
>  
> 3. Snapshot procedure
> We found that it is possible to create a snapshot with a TTL value so low 
> that it will expire before the SnapshotProcedure has finished. The 
> SnapshotProcedure will finish normally as if the snapshot is fine.
> Steps to reproduce this bug:
> A. Start an HBase cluster and create a table
> B. Create a snapshot using hbase shell with TTL=1 
> `snapshot 'mytable', 'snapshot1234', \{TTL => 1}`
> C. Check that the command finished without an error, and the snapshot has 
> expired
> This behavior is only possible if the user accidentally sets the TTL to be 
> too low or if the SnapshotProcedure is interrupted after the 
> `SNAPSHOT_WRITE_SNAPSHOT_INFO` but before it’s fully finished.
> We propose to add an expiration check in the `SNAPSHOT_COMPLETE_SNAPSHOT` 
> phase right before the snapshot is marked as completed to ensure that the 
> snapshot hasn’t expired before the SnapshotProcedure is considered 
> successfully finished.
> Granted, we’re not sure whether this one is a bug or an intended behavior



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to