[ https://issues.apache.org/jira/browse/HBASE-29346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prathyusha updated HBASE-29346: ------------------------------- Description: We restore snapshots to a temporary directory for Snapshot reads. When restored multiple SnapshotManifests (both created on same table at t1, t2 t2>t1), on the same temp dir, it deletes the merge parent regions from {color:#de350b}/hbase/data/ instead of temp restore folder as part of restore regions of{color} [RestoreSnapshotHelper|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L416] Reproduce steps # Create a Snapshot on a table # Restore that snapshot on a temporary restoreDirectory instead of the same table # Delete that snapshot from shell # Disable compactions and trigger Merge # Create another snapshot # Restore that snapshot on to the same restoreDirectory from Step-2 # It archives the closed parent regions from /hbase/data/ of actual table instead of temporary restoreDirectory leaving dangling references in daughter region which ends up in dataloss # Restart the regionserver holding the merged daughter region and it will end up in FAILED_OPEN state due to dangling reference files and the parent store files are already archived Proposed immediate fix - RestoreSnapshotHelper does {{restore, add, remove}} regions. Restore/Add operations use {{tableDir}} of RestoreSnapshotHelper (which is constructed from {{{}restoreDir{}}}) to construct {{RegionDir}} paths We should do the same strategy in removeRegions path also, currently [RestoreSnapshotHelper.removeHdfsRegion|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L416] currently uses [HFileArchiver.archiveRegion|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java#L104] which essentially is constructing table from rootDir instead of restoreDir was: We restore snapshots to a temporary directory for Snapshot reads. When restored multiple SnapshotManifests (both created on same table at t1, t2 t2>t1), on the same temp dir, it deletes the merge parent regions from {color:#de350b}/hbase/data/ {color:#172b4d}instead of temp restore folder as part of restore regions of{color} [RestoreSnapshotHelper|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L416]{color} Reproduce steps # Create a Snapshot on a table # Restore that snapshot on a temporary restoreDirectory instead of the same table # Delete that snapshot from shell # Disable compactions and trigger Merge # Create another snapshot # Restore that snapshot on to the same restoreDirectory from Step-2 # It archives the closed parent regions from /hbase/data/ of actual table instead of temporary restoreDirectory leaving dangling references in daughter region which ends up in dataloss Proposed immediate fix - RestoreSnapshotHelper does {{restore, add, remove}} regions. Restore/Add operations use {{tableDir}} of RestoreSnapshotHelper (which is constructed from {{{}restoreDir{}}}) to construct {{RegionDir}} paths We should do the same strategy in removeRegions path also, currently [RestoreSnapshotHelper.removeHdfsRegion|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L416] currently uses [HFileArchiver.archiveRegion|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java#L104] which essentially is constructing table from rootDir instead of restoreDir > Multiple Snapshot restores on same restoreDir ends up in Dataloss > ----------------------------------------------------------------- > > Key: HBASE-29346 > URL: https://issues.apache.org/jira/browse/HBASE-29346 > Project: HBase > Issue Type: Bug > Reporter: Prathyusha > Assignee: Prathyusha > Priority: Critical > Labels: pull-request-available > > We restore snapshots to a temporary directory for Snapshot reads. > When restored multiple SnapshotManifests (both created on same table at t1, > t2 t2>t1), on the same temp dir, it deletes the merge parent regions from > {color:#de350b}/hbase/data/ instead of temp restore folder as part of > restore regions of{color} > [RestoreSnapshotHelper|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L416] > Reproduce steps > # Create a Snapshot on a table > # Restore that snapshot on a temporary restoreDirectory instead of the same > table > # Delete that snapshot from shell > # Disable compactions and trigger Merge > # Create another snapshot > # Restore that snapshot on to the same restoreDirectory from Step-2 > # It archives the closed parent regions from /hbase/data/ of actual table > instead of temporary restoreDirectory leaving dangling references in daughter > region which ends up in dataloss > # Restart the regionserver holding the merged daughter region and it will > end up in FAILED_OPEN state due to dangling reference files and the parent > store files are already archived > Proposed immediate fix - > RestoreSnapshotHelper does {{restore, add, remove}} regions. > Restore/Add operations use {{tableDir}} of RestoreSnapshotHelper (which is > constructed from {{{}restoreDir{}}}) to construct {{RegionDir}} paths > We should do the same strategy in removeRegions path also, > currently > [RestoreSnapshotHelper.removeHdfsRegion|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L416] > currently uses > [HFileArchiver.archiveRegion|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java#L104] > which essentially is constructing table from rootDir instead of restoreDir -- This message was sent by Atlassian Jira (v8.20.10#820010)