[ 
https://issues.apache.org/jira/browse/HBASE-29346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-29346.
-------------------------------
    Fix Version/s: 2.7.0
                   3.0.0-beta-2
                   2.6.3
                   2.5.12
     Hadoop Flags: Reviewed
       Resolution: Fixed

Pushed to all active branches.

Thanks [~prathyu6] for contributing!

> Multiple Snapshot restores on same restoreDir ends up in Dataloss
> -----------------------------------------------------------------
>
>                 Key: HBASE-29346
>                 URL: https://issues.apache.org/jira/browse/HBASE-29346
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>            Reporter: Prathyusha
>            Assignee: Prathyusha
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3, 2.5.12
>
>
> We restore snapshots to a temporary directory for Snapshot reads.
> When restored multiple SnapshotManifests (both created on same table at t1, 
> t2 t2>t1), on the same temp dir, it deletes the merge parent regions from 
> {color:#de350b}/hbase/data/  instead of temp restore folder as part of 
> restore regions of{color} 
> [RestoreSnapshotHelper|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L416]
> Reproduce steps
>  # Create a Snapshot on a table
>  # Restore that snapshot on a temporary restoreDirectory instead of the same 
> table
>  # Delete that snapshot from shell
>  # Disable compactions and trigger Merge
>  # Create another snapshot 
>  # Restore that snapshot on to the same restoreDirectory from Step-2
>  # It archives the closed parent regions from /hbase/data/ of actual table 
> instead of temporary restoreDirectory leaving dangling references in daughter 
> region which ends up in dataloss
>  # Restart the regionserver holding the merged daughter region and it will 
> end up in FAILED_OPEN state due to dangling reference files and the parent 
> store files are already archived
> Proposed immediate fix -
> RestoreSnapshotHelper does {{restore, add, remove}} regions.
> Restore/Add operations use {{tableDir}} of RestoreSnapshotHelper (which is 
> constructed from {{{}restoreDir{}}}) to construct {{RegionDir}} paths
> We should do the same strategy in removeRegions path also, 
> currently 
> [RestoreSnapshotHelper.removeHdfsRegion|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L416]
>  currently uses 
> [HFileArchiver.archiveRegion|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java#L104]
> which essentially is constructing table from rootDir instead of restoreDir



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to