[ https://issues.apache.org/jira/browse/HBASE-13351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494848#comment-14494848 ]
Josh Elser commented on HBASE-13351: ------------------------------------ Ah! I think I got to the bottom of why this deadlocks without sufficient priority-pool threads. {{MasterRpcServices#reportRegionStateTransition}} ultimately makes another {{Get}} to meta which automatically gets put at priority 200 (because it's a request against meta). So, the region server fires off reportRegionStateTransition calls to the Master, these end up going back into the same thread pool which has no more threads to handle the requests. Boom, deadlock. The confusing part (or at least the part I don't understand) is why this is going back to the Master and not a RS. Maybe it's due to the Master acting as a RS? Maybe I just don't understand how this works completely :) {noformat} Daemon Thread [PriorityRpcServer.handler=1,queue=1,port=64100] (Suspended) waiting for: AsyncCall (id=891) Object.wait(long) line: not available [native method] AsyncCall(Object).wait(long, int) line: 461 AsyncCall(DefaultPromise<V>).await0(long, boolean) line: 355 AsyncCall(DefaultPromise<V>).await(long, TimeUnit) line: 266 AsyncCall(AbstractFuture<V>).get(long, TimeUnit) line: 42 AsyncRpcClient.call(PayloadCarryingRpcController, MethodDescriptor, Message, Message, User, InetSocketAddress) line: 226 AsyncRpcClient(AbstractRpcClient).callBlockingMethod(Descriptors$MethodDescriptor, PayloadCarryingRpcController, Message, Message, User, InetSocketAddress) line: 213 AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(Descriptors$MethodDescriptor, RpcController, Message, Message) line: 287 ClientProtos$ClientService$BlockingStub.get(RpcController, ClientProtos$GetRequest) line: 32391 HTable$3.call(int) line: 686 HTable$3.call(int) line: 1 RpcRetryingCallerImpl<T>.callWithRetries(RetryingCallable<T>, int) line: 117 HTable.get(Get) line: 694 MetaTableAccessor.getTableState(Connection, TableName) line: 1075 TableStateManager.readMetaState(TableName) line: 187 TableStateManager.getTableState(TableName) line: 171 TableStateManager.isTableState(TableName, TableState$State...) line: 130 AssignmentManager.onRegionOpen(RegionState, HRegionInfo, ServerName, RegionServerStatusProtos$RegionStateTransition) line: 2183 AssignmentManager.onRegionTransition(ServerName, RegionServerStatusProtos$RegionStateTransition) line: 2754 MasterRpcServices.reportRegionStateTransition(RpcController, RegionServerStatusProtos$ReportRegionStateTransitionRequest) line: 1264 RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(Descriptors$MethodDescriptor, RpcController, Message) line: 8623 RpcServer.call(BlockingService, MethodDescriptor, Message, CellScanner, long, MonitoredRPCHandler) line: 2095 CallRunner.run() line: 101 BalancedQueueRpcExecutor(RpcExecutor).consumerLoop(BlockingQueue<CallRunner>) line: 130 RpcExecutor$2.run() line: 107 Thread.run() line: 745 Daemon Thread [PostOpenDeployTasks:d923ab785d95578230ec49fbb1f40e8e] (Suspended) waiting for: AsyncCall (id=808) Object.wait(long) line: not available [native method] AsyncCall(Object).wait(long, int) line: 461 AsyncCall(DefaultPromise<V>).await0(long, boolean) line: 355 AsyncCall(DefaultPromise<V>).await(long, TimeUnit) line: 266 AsyncCall(AbstractFuture<V>).get(long, TimeUnit) line: 42 AsyncRpcClient.call(PayloadCarryingRpcController, MethodDescriptor, Message, Message, User, InetSocketAddress) line: 226 AsyncRpcClient(AbstractRpcClient).callBlockingMethod(Descriptors$MethodDescriptor, PayloadCarryingRpcController, Message, Message, User, InetSocketAddress) line: 213 AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(Descriptors$MethodDescriptor, RpcController, Message, Message) line: 287 RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRegionStateTransition(RpcController, RegionServerStatusProtos$ReportRegionStateTransitionRequest) line: 9030 MiniHBaseCluster$MiniHBaseClusterRegionServer(HRegionServer).reportRegionStateTransition(RegionServerStatusProtos$RegionStateTransition$TransitionCode, long, HRegionInfo...) line: 1949 MiniHBaseCluster$MiniHBaseClusterRegionServer(HRegionServer).postOpenDeployTasks(Region) line: 1884 OpenRegionHandler$PostOpenDeployTasksThread.run() line: 241 {noformat} > Annotate internal MasterRpcServices methods with admin priority > --------------------------------------------------------------- > > Key: HBASE-13351 > URL: https://issues.apache.org/jira/browse/HBASE-13351 > Project: HBase > Issue Type: Improvement > Components: master > Reporter: Josh Elser > Assignee: Josh Elser > Fix For: 2.0.0, 1.1.0 > > Attachments: HBASE-13351-v1.patch, HBASE-13351-v2.patch, > HBASE-13351-v3.patch, HBASE-13351.patch > > > HBASE-12071, among other things, introduced annotating RPC methods to give > certain methods priority over others. Namely, this helps ensure that client > requests cannot starve out internal RPC between master and regionserver. > Similarly, we can do the same thing for Master RPC methods that are invoked > by RS's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)