[ https://issues.apache.org/jira/browse/HBASE-27366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603609#comment-17603609 ]
Duo Zhang commented on HBASE-27366: ----------------------------------- The LockProcedure will take the table exclusive lock of ProcedureScheduler, wile split/merge will take the shared lock, so I do not think the problem is we do not use LockProcudure in split/merge procedure. Maybe the problem is how we deal with master restart? And on master and branch-2, we introduced a SnapshotProcedure. Maybe you could try to see if this approach can solve the problem. > split or merge removed region under snapshot > -------------------------------------------- > > Key: HBASE-27366 > URL: https://issues.apache.org/jira/browse/HBASE-27366 > Project: HBase > Issue Type: Bug > Components: snapshots > Affects Versions: 2.4.10 > Reporter: Huaxiang Sun > Priority: Major > > We run into snapshot failures for one table with large number of regions. The > event sequence is like the following: > > # Snapshot process lists all regions for one table. > # Normalize kicks in to split some regions for the table under snapshot. > # split finishes and major compaction finishes. The parent region is moved > to archive. > # When the Snapshot processes the parent region, it does not exist and > snapshot fails. > Since snapshot process acquires the table lock, but there is no table lock > acquired in split or merge process, they crash into each other. -- This message was sent by Atlassian Jira (v8.20.10#820010)