Dimas Shidqi Parikesit created HBASE-29296:
----------------------------------------------
Summary: Missing critical snapshot expiration checks
Key: HBASE-29296
URL: https://issues.apache.org/jira/browse/HBASE-29296
Project: HBase
Issue Type: Bug
Components: backup&restore, snapshots
Affects Versions: 2.6.2
Reporter: Dimas Shidqi Parikesit
In HBase it is crucial to prevent expired snapshots returned to clients to
ensure correctness. There have been existing efforts (e.g., HBASE-27671 and
HBASE-28704) adding snapshot expiration checks in different scenarios to avoid
such issues. However, we found such protection is not consistent. Specifically,
several operations still miss such checks in the latest hbase version
(5dafa9e). Their patterns are similar to the previous tickets mentioned above.
In practice, we observed expired snapshots still returning to clients
successfully without generating any alarms.
We have written test cases to prove these issues can be reproduced successfully
(see attached). We also attach the manual steps in case anyone is interested.
Your insights are very much appreciated. We will continue following up this
issue until it is resolved.
Reproducing steps (3 scenarios in total)
# Restore
Doing a restore on full backup will succeed even if the snapshot has expired.
This expiration can happen if during the backup, `hbase.master.snapshot.ttl`
was set.
Steps to reproduce this bug:
A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value
B. Create a table
C. Create a full backup using `hbase backup create full
hdfs://host5:9000/data/backup -t tableName`
D. Wait until the snapshot has expired
E. Restore the table using `hbase restore hdfs://host5:9000/data/backup
<backup_id>`
F. Check that the table is restored successfully
We propose to add a snapshot expiration check on
RestoreTool.java:createAndRestoreTable to prevent this issue.
# Incremental backup
Incremental backup is done based on a previous full backup. Incremental backup
will succeed even if the full backup has expired.
Steps to reproduce this bug:
A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value
B. Create a table
C. Create a full backup using `hbase backup create full
hdfs://host5:9000/data/backup -t tableName`
D. Wait until the snapshot has expired
E. Restore the table using `hbase restore hdfs://host5:9000/data/backup
<backup_id>`
F. Check that the table is restored successfully
We propose to add a snapshot expiration check on
IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue.
# Snapshot procedure
We found that it is possible to create a snapshot with a TTL value so low that
it will expire before the SnapshotProcedure has finished. The SnapshotProcedure
will finish normally as if the snapshot is fine.
Steps to reproduce this bug:
A. Start an HBase cluster and create a table
B. Create a snapshot using hbase shell with TTL=1
`snapshot 'mytable', 'snapshot1234', \{TTL => 1}`
C. Check that the command finished without an error, and the snapshot has
expired
This behavior is only possible if the user accidentally sets the TTL to be too
low or if the SnapshotProcedure is interrupted after the
`SNAPSHOT_WRITE_SNAPSHOT_INFO` but before it’s fully finished.
We propose to add an expiration check in the `SNAPSHOT_COMPLETE_SNAPSHOT` phase
right before the snapshot is marked as completed to ensure that the snapshot
hasn’t expired before the SnapshotProcedure is considered successfully finished.
Granted, we’re not sure whether this one is a bug or an intended behavior
--
This message was sent by Atlassian Jira
(v8.20.10#820010)