[ 
https://issues.apache.org/jira/browse/HBASE-13351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494848#comment-14494848
 ] 

Josh Elser commented on HBASE-13351:
------------------------------------

Ah! I think I got to the bottom of why this deadlocks without sufficient 
priority-pool threads. {{MasterRpcServices#reportRegionStateTransition}} 
ultimately makes another {{Get}} to meta which automatically gets put at 
priority 200 (because it's a request against meta).

So, the region server fires off reportRegionStateTransition calls to the 
Master, these end up going back into the same thread pool which has no more 
threads to handle the requests. Boom, deadlock. The confusing part (or at least 
the part I don't understand) is why this is going back to the Master and not a 
RS. Maybe it's due to the Master acting as a RS? Maybe I just don't understand 
how this works completely :)

{noformat}
Daemon Thread [PriorityRpcServer.handler=1,queue=1,port=64100] (Suspended)      
        waiting for: AsyncCall  (id=891)        
        Object.wait(long) line: not available [native method]   
        AsyncCall(Object).wait(long, int) line: 461     
        AsyncCall(DefaultPromise<V>).await0(long, boolean) line: 355    
        AsyncCall(DefaultPromise<V>).await(long, TimeUnit) line: 266    
        AsyncCall(AbstractFuture<V>).get(long, TimeUnit) line: 42       
        AsyncRpcClient.call(PayloadCarryingRpcController, MethodDescriptor, 
Message, Message, User, InetSocketAddress) line: 226        
        
AsyncRpcClient(AbstractRpcClient).callBlockingMethod(Descriptors$MethodDescriptor,
 PayloadCarryingRpcController, Message, Message, User, InetSocketAddress) line: 
213   
        
AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(Descriptors$MethodDescriptor,
 RpcController, Message, Message) line: 287  
        ClientProtos$ClientService$BlockingStub.get(RpcController, 
ClientProtos$GetRequest) line: 32391 
        HTable$3.call(int) line: 686    
        HTable$3.call(int) line: 1      
        RpcRetryingCallerImpl<T>.callWithRetries(RetryingCallable<T>, int) 
line: 117    
        HTable.get(Get) line: 694       
        MetaTableAccessor.getTableState(Connection, TableName) line: 1075       
        TableStateManager.readMetaState(TableName) line: 187    
        TableStateManager.getTableState(TableName) line: 171    
        TableStateManager.isTableState(TableName, TableState$State...) line: 
130        
        AssignmentManager.onRegionOpen(RegionState, HRegionInfo, ServerName, 
RegionServerStatusProtos$RegionStateTransition) line: 2183 
        AssignmentManager.onRegionTransition(ServerName, 
RegionServerStatusProtos$RegionStateTransition) line: 2754     
        MasterRpcServices.reportRegionStateTransition(RpcController, 
RegionServerStatusProtos$ReportRegionStateTransitionRequest) line: 1264    
        
RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(Descriptors$MethodDescriptor,
 RpcController, Message) line: 8623        
        RpcServer.call(BlockingService, MethodDescriptor, Message, CellScanner, 
long, MonitoredRPCHandler) line: 2095   
        CallRunner.run() line: 101      
        
BalancedQueueRpcExecutor(RpcExecutor).consumerLoop(BlockingQueue<CallRunner>) 
line: 130 
        RpcExecutor$2.run() line: 107   
        Thread.run() line: 745  
  
Daemon Thread [PostOpenDeployTasks:d923ab785d95578230ec49fbb1f40e8e] 
(Suspended)        
        waiting for: AsyncCall  (id=808)        
        Object.wait(long) line: not available [native method]   
        AsyncCall(Object).wait(long, int) line: 461     
        AsyncCall(DefaultPromise<V>).await0(long, boolean) line: 355    
        AsyncCall(DefaultPromise<V>).await(long, TimeUnit) line: 266    
        AsyncCall(AbstractFuture<V>).get(long, TimeUnit) line: 42       
        AsyncRpcClient.call(PayloadCarryingRpcController, MethodDescriptor, 
Message, Message, User, InetSocketAddress) line: 226        
        
AsyncRpcClient(AbstractRpcClient).callBlockingMethod(Descriptors$MethodDescriptor,
 PayloadCarryingRpcController, Message, Message, User, InetSocketAddress) line: 
213   
        
AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(Descriptors$MethodDescriptor,
 RpcController, Message, Message) line: 287  
        
RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRegionStateTransition(RpcController,
 RegionServerStatusProtos$ReportRegionStateTransitionRequest) line: 9030      
        
MiniHBaseCluster$MiniHBaseClusterRegionServer(HRegionServer).reportRegionStateTransition(RegionServerStatusProtos$RegionStateTransition$TransitionCode,
 long, HRegionInfo...) line: 1949        
        
MiniHBaseCluster$MiniHBaseClusterRegionServer(HRegionServer).postOpenDeployTasks(Region)
 line: 1884     
        OpenRegionHandler$PostOpenDeployTasksThread.run() line: 241     
{noformat}

> Annotate internal MasterRpcServices methods with admin priority
> ---------------------------------------------------------------
>
>                 Key: HBASE-13351
>                 URL: https://issues.apache.org/jira/browse/HBASE-13351
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: HBASE-13351-v1.patch, HBASE-13351-v2.patch, 
> HBASE-13351-v3.patch, HBASE-13351.patch
>
>
> HBASE-12071, among other things, introduced annotating RPC methods to give 
> certain methods priority over others. Namely, this helps ensure that client 
> requests cannot starve out internal RPC between master and regionserver.
> Similarly, we can do the same thing for Master RPC methods that are invoked 
> by RS's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to