[ https://issues.apache.org/jira/browse/HBASE-29296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HBASE-29296: ----------------------------------- Labels: pull-request-available (was: ) > Missing critical snapshot expiration checks > ------------------------------------------- > > Key: HBASE-29296 > URL: https://issues.apache.org/jira/browse/HBASE-29296 > Project: HBase > Issue Type: Bug > Components: backup&restore, snapshots > Affects Versions: 2.6.2 > Reporter: Dimas Shidqi Parikesit > Priority: Critical > Labels: pull-request-available > > In HBase it is crucial to prevent expired snapshots returned to clients to > ensure correctness. There have been existing efforts (e.g., HBASE-27671 and > HBASE-28704) adding snapshot expiration checks in different scenarios to > avoid such issues. However, we found such protection is not consistent. > Specifically, several operations still miss such checks in the latest hbase > version (5dafa9e). Their patterns are similar to the previous tickets > mentioned above. In practice, we observed expired snapshots still returning > to clients successfully without generating any alarms. > We have written test cases to prove these issues can be reproduced > successfully (see attached). We also attach the manual steps in case anyone > is interested. > Your insights are very much appreciated. We will continue following up this > issue until it is resolved. > Reproducing steps (3 scenarios in total) > 1. Restore > Doing a restore on full backup will succeed even if the snapshot has expired. > This expiration can happen if during the backup, `hbase.master.snapshot.ttl` > was set. > Steps to reproduce this bug: > A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value > B. Create a table > C. Create a full backup using `hbase backup create full > hdfs://host5:9000/data/backup -t tableName` > D. Wait until the snapshot has expired > E. Restore the table using `hbase restore hdfs://host5:9000/data/backup > <backup_id>` > F. Check that the table is restored successfully > We propose to add a snapshot expiration check on > RestoreTool.java:createAndRestoreTable to prevent this issue. > > 2. Incremental backup > Incremental backup is done based on a previous full backup. Incremental > backup will succeed even if the full backup has expired. > Steps to reproduce this bug: > A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value > B. Create a table > C. Create a full backup using `hbase backup create full > hdfs://host5:9000/data/backup -t tableName` > D. Wait until the snapshot has expired > E. Create an incremental backup using `hbase backup create incremental > hdfs://host5:9000/data/backup -t tableName` > F. Check that the backup succeed > We propose to add a snapshot expiration check on > IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue. > > 3. Snapshot procedure > We found that it is possible to create a snapshot with a TTL value so low > that it will expire before the SnapshotProcedure has finished. The > SnapshotProcedure will finish normally as if the snapshot is fine. > Steps to reproduce this bug: > A. Start an HBase cluster and create a table > B. Create a snapshot using hbase shell with TTL=1 > `snapshot 'mytable', 'snapshot1234', \{TTL => 1}` > C. Check that the command finished without an error, and the snapshot has > expired > This behavior is only possible if the user accidentally sets the TTL to be > too low or if the SnapshotProcedure is interrupted after the > `SNAPSHOT_WRITE_SNAPSHOT_INFO` but before it’s fully finished. > We propose to add an expiration check in the `SNAPSHOT_COMPLETE_SNAPSHOT` > phase right before the snapshot is marked as completed to ensure that the > snapshot hasn’t expired before the SnapshotProcedure is considered > successfully finished. > Granted, we’re not sure whether this one is a bug or an intended behavior -- This message was sent by Atlassian Jira (v8.20.10#820010)