[ 
https://issues.apache.org/jira/browse/HBASE-29296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimas Shidqi Parikesit updated HBASE-29296:
-------------------------------------------
    Description: 
In HBase it is crucial to prevent expired snapshots returned to clients to 
ensure correctness. There have been existing efforts  (e.g., HBASE-27671 and  
HBASE-28704) adding snapshot expiration checks in different scenarios to avoid 
such issues. However, we found such protection is not consistent. Specifically, 
several operations still miss such checks in the latest hbase version 
(5dafa9e). Their patterns are similar to the previous tickets mentioned above. 
In practice, we observed expired snapshots still returning to clients 
successfully without generating any alarms.

We have written test cases to prove these issues can be reproduced successfully 
(see attached). We also attach the manual steps in case anyone is interested. 

Your insights are very much appreciated. We will continue following up this 
issue until it is resolved.

Reproducing steps (3 scenarios in total)

1. Restore
Doing a restore on full backup will succeed even if the snapshot has expired. 
This expiration can happen if during the backup, `hbase.master.snapshot.ttl` 
was set.

Steps to reproduce this bug:
A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value
B. Create a table
C. Create a full backup using `hbase backup create full 
hdfs://host5:9000/data/backup -t tableName`
D. Wait until the snapshot has expired
E. Restore the table using `hbase restore hdfs://host5:9000/data/backup 
<backup_id>`
F. Check that the table is restored successfully

We propose to add a snapshot expiration check on 
RestoreTool.java:createAndRestoreTable to prevent this issue.

 

2. Incremental backup
Incremental backup is done based on a previous full backup. Incremental backup 
will succeed even if the full backup has expired.

Steps to reproduce this bug:
A. Create a table
B. Create a full backup using `hbase backup create full 
hdfs://host5:9000/data/backup -t tableName`
C. Wait until the snapshot has expired
D. Create an incremental backup using `hbase backup create 
hdfs://host5:9000/data/backup -t tableName`
E. Check that the backup succeed

We propose to add a snapshot expiration check on 
IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue.

 

3. Snapshot procedure

We found that it is possible to create a snapshot with a TTL value so low that 
it will expire before the SnapshotProcedure has finished. The SnapshotProcedure 
will finish normally as if the snapshot is fine.

Steps to reproduce this bug:
A. Start an HBase cluster and create a table
B. Create a snapshot using hbase shell with TTL=1 
`snapshot 'mytable', 'snapshot1234', \{TTL => 1}`
C. Check that the command finished without an error, and the snapshot has 
expired

This behavior is only possible if the user accidentally sets the TTL to be too 
low or if the SnapshotProcedure is interrupted after the 
`SNAPSHOT_WRITE_SNAPSHOT_INFO` but before it’s fully finished.

We propose to add an expiration check in the `SNAPSHOT_COMPLETE_SNAPSHOT` phase 
right before the snapshot is marked as completed to ensure that the snapshot 
hasn’t expired before the SnapshotProcedure is considered successfully finished.

Granted, we’re not sure whether this one is a bug or an intended behavior

  was:
In HBase it is crucial to prevent expired snapshots returned to clients to 
ensure correctness. There have been existing efforts  (e.g., HBASE-27671 and  
HBASE-28704) adding snapshot expiration checks in different scenarios to avoid 
such issues. However, we found such protection is not consistent. Specifically, 
several operations still miss such checks in the latest hbase version 
(5dafa9e). Their patterns are similar to the previous tickets mentioned above. 
In practice, we observed expired snapshots still returning to clients 
successfully without generating any alarms.

We have written test cases to prove these issues can be reproduced successfully 
(see attached). We also attach the manual steps in case anyone is interested. 

Your insights are very much appreciated. We will continue following up this 
issue until it is resolved.

Reproducing steps (3 scenarios in total)

1. Restore
Doing a restore on full backup will succeed even if the snapshot has expired. 
This expiration can happen if during the backup, `hbase.master.snapshot.ttl` 
was set.

Steps to reproduce this bug:
A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value
B. Create a table
C. Create a full backup using `hbase backup create full 
hdfs://host5:9000/data/backup -t tableName`
D. Wait until the snapshot has expired
E. Restore the table using `hbase restore hdfs://host5:9000/data/backup 
<backup_id>`
F. Check that the table is restored successfully

We propose to add a snapshot expiration check on 
RestoreTool.java:createAndRestoreTable to prevent this issue.

 

2. Incremental backup
Incremental backup is done based on a previous full backup. Incremental backup 
will succeed even if the full backup has expired.

Steps to reproduce this bug:
A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value
B. Create a table
C. Create a full backup using `hbase backup create full 
hdfs://host5:9000/data/backup -t tableName`
D. Wait until the snapshot has expired
E. Restore the table using `hbase restore hdfs://host5:9000/data/backup 
<backup_id>`
F. Check that the table is restored successfully

We propose to add a snapshot expiration check on 
IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue.

 

3. Snapshot procedure

We found that it is possible to create a snapshot with a TTL value so low that 
it will expire before the SnapshotProcedure has finished. The SnapshotProcedure 
will finish normally as if the snapshot is fine.

Steps to reproduce this bug:
A. Start an HBase cluster and create a table
B. Create a snapshot using hbase shell with TTL=1 
`snapshot 'mytable', 'snapshot1234', \{TTL => 1}`
C. Check that the command finished without an error, and the snapshot has 
expired

This behavior is only possible if the user accidentally sets the TTL to be too 
low or if the SnapshotProcedure is interrupted after the 
`SNAPSHOT_WRITE_SNAPSHOT_INFO` but before it’s fully finished.

We propose to add an expiration check in the `SNAPSHOT_COMPLETE_SNAPSHOT` phase 
right before the snapshot is marked as completed to ensure that the snapshot 
hasn’t expired before the SnapshotProcedure is considered successfully finished.

Granted, we’re not sure whether this one is a bug or an intended behavior


> Missing critical snapshot expiration checks
> -------------------------------------------
>
>                 Key: HBASE-29296
>                 URL: https://issues.apache.org/jira/browse/HBASE-29296
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&amp;restore, snapshots
>    Affects Versions: 2.6.2
>            Reporter: Dimas Shidqi Parikesit
>            Priority: Critical
>
> In HBase it is crucial to prevent expired snapshots returned to clients to 
> ensure correctness. There have been existing efforts  (e.g., HBASE-27671 and  
> HBASE-28704) adding snapshot expiration checks in different scenarios to 
> avoid such issues. However, we found such protection is not consistent. 
> Specifically, several operations still miss such checks in the latest hbase 
> version (5dafa9e). Their patterns are similar to the previous tickets 
> mentioned above. In practice, we observed expired snapshots still returning 
> to clients successfully without generating any alarms.
> We have written test cases to prove these issues can be reproduced 
> successfully (see attached). We also attach the manual steps in case anyone 
> is interested. 
> Your insights are very much appreciated. We will continue following up this 
> issue until it is resolved.
> Reproducing steps (3 scenarios in total)
> 1. Restore
> Doing a restore on full backup will succeed even if the snapshot has expired. 
> This expiration can happen if during the backup, `hbase.master.snapshot.ttl` 
> was set.
> Steps to reproduce this bug:
> A. Start an HBase cluster, and set `hbase.master.snapshot.ttl` config value
> B. Create a table
> C. Create a full backup using `hbase backup create full 
> hdfs://host5:9000/data/backup -t tableName`
> D. Wait until the snapshot has expired
> E. Restore the table using `hbase restore hdfs://host5:9000/data/backup 
> <backup_id>`
> F. Check that the table is restored successfully
> We propose to add a snapshot expiration check on 
> RestoreTool.java:createAndRestoreTable to prevent this issue.
>  
> 2. Incremental backup
> Incremental backup is done based on a previous full backup. Incremental 
> backup will succeed even if the full backup has expired.
> Steps to reproduce this bug:
> A. Create a table
> B. Create a full backup using `hbase backup create full 
> hdfs://host5:9000/data/backup -t tableName`
> C. Wait until the snapshot has expired
> D. Create an incremental backup using `hbase backup create 
> hdfs://host5:9000/data/backup -t tableName`
> E. Check that the backup succeed
> We propose to add a snapshot expiration check on 
> IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue.
>  
> 3. Snapshot procedure
> We found that it is possible to create a snapshot with a TTL value so low 
> that it will expire before the SnapshotProcedure has finished. The 
> SnapshotProcedure will finish normally as if the snapshot is fine.
> Steps to reproduce this bug:
> A. Start an HBase cluster and create a table
> B. Create a snapshot using hbase shell with TTL=1 
> `snapshot 'mytable', 'snapshot1234', \{TTL => 1}`
> C. Check that the command finished without an error, and the snapshot has 
> expired
> This behavior is only possible if the user accidentally sets the TTL to be 
> too low or if the SnapshotProcedure is interrupted after the 
> `SNAPSHOT_WRITE_SNAPSHOT_INFO` but before it’s fully finished.
> We propose to add an expiration check in the `SNAPSHOT_COMPLETE_SNAPSHOT` 
> phase right before the snapshot is marked as completed to ensure that the 
> snapshot hasn’t expired before the SnapshotProcedure is considered 
> successfully finished.
> Granted, we’re not sure whether this one is a bug or an intended behavior



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to