stack created HBASE-20152: ----------------------------- Summary: [AMv2] DisableTableProcedure versus ServerCrashProcedure Key: HBASE-20152 URL: https://issues.apache.org/jira/browse/HBASE-20152 Project: HBase Issue Type: Bug Components: amv2 Reporter: stack Assignee: stack
Seeing a small spate of issues where disabled tables/regions are being assigned. Usually they happen when a DisableTableProcedure is running concurrent with a ServerCrashProcedure. See below. See associated HBASE-20131. This is umbrella issue for fixing. .h2 Deadlock >From HBASE-20137, 'TestRSGroups is Flakey', >https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325 {code} * SCP is running because a server was aborted in test. * SCP starts AssignProcedure of region X from crashed server. * DisableTable Procedure runs because test has finished and we're doing table delete. Queues * UnassignProcedure for region X. * Disable Unassign gets Lock on region X first. * SCP AssignProcedure tries to get lock, waits on lock. * DisableTable Procedure UnassignProcedure RPC fails because server is down (Thats why the SCP). * Tries to expire the server it failed the RPC against. Fails (currently being SCP'd). * DisableTable Procedure Unassign is suspended. It is a suspend with lock on region X held * SCP can't run because lock on X is held * Test timesout. {code} .h2 Delete of online Regions Saw this in nightly failure #452 for branch-2 in TestSplitTransactionOnCluster.org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster {code} * DisableTableProcedure is queued before SCP. * DisableTableProcedure Unassign fails because can't RPC to crashed server and can't expire. * Unassign is Stuck in suspend. * SCP runs and cleans up suspended Disable Unassign. * SCP completes which includes assign of Disable Unassign region. * Disable Unassign completes * Disable completes. * A scheduled Drop Table Procedure runs (its end of test). * Succeeds deleting regions that are actually assigned (see above where SCP assigned region). {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)