[ 
https://issues.apache.org/jira/browse/PHOENIX-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067544#comment-16067544
 ] 

Samarth Jain commented on PHOENIX-3983:
---------------------------------------

The region server hosting system catalog issues index rebuild scans against the 
data table region servers. If the ServerRpcControllerFactory is configured on 
the region servers, then these scan RPCs have their priority set to the INDEX 
priority which results in these RPC calls being handled on the destination 
servers by the INDEX handlers. In turn, these index handlers are used to do 
local writes to the data table which then trigger remote RPCs to the index 
tables. These RPCs are then again handled by the index handlers on the region 
servers hosting index table regions. This can result in a deadlock. Consider 
this simple scenario:

Two region server setup.
RS1 - SYSTEM.CATALOG
RS2 - DATA_TABLE, INDEX_TABLE
RS3 - DATA_TABLE, INDEX_TABLE
For simplicity lets assume that number of index rpc handlers  is 1. Let's name 
the lone handler as T1 on RS2 and T1' on RS3.
Number of regular rpc handlers - 1

RS1 -> issues a scan on data table region servers. These scans are then handled 
on RS2 by T1 and RS3 by T1'
The index handler T1 on RS2 and T1' on RS3 then write locally to their data 
table regions which results in remote RPCs to RS3 and RS2 respectively.

RPC from RS3 to RS2 is not able to proceed because the index handler T1 on RS2 
that could service this call is waiting on it's RPC to RS3 to finish. 
RPC from RS2 to RS3 is not able to proceed because the index handler T1' on RS3 
that could service this call is waiting on it's RPC to RS2 to finish.

Deadlock. 

The fix is to *unset* the server rpc controller factory so that the scans 
happening on data table region servers are handled by DefaultRPCHandler s and 
*not* IndexRPCHandlers.

Many thanks to [~vincentpoon] for his help in debugging and identifying the 
issue. 

FYI, [~lhofhansl]. 


> Index rebuild scans should not be using the ServerRpcControllerFactory
> ----------------------------------------------------------------------
>
>                 Key: PHOENIX-3983
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3983
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Samarth Jain
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to