[ 
https://issues.apache.org/jira/browse/HBASE-20881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778758#comment-17778758
 ] 

Viraj Jasani commented on HBASE-20881:
--------------------------------------

[~zhangduo] IIUC, the only reason why we had to introduce ABNORMALLY_CLOSED 
state is because when a region is already in RIT, and the target server where 
it is assigned or getting assigned to crashes, SCP has to interrupt old TRSP 
and create new TRSPs to take care of assigning all regions that were previously 
hosted by the target server, but any region already in transition might require 
manual intervention because SCP cannot be certain what step of the previous 
TRSP, the region was stuck while it was in RIT.

For SCP, any RIT on dead server is a complex state to deal with because it 
cannot know for certain whether the region was stuck in any coproc hook on the 
host or it was stuck while making RPC call to remote server and what was the 
outcome of the RPC call etc.

 

Does this seem correct? We were thinking of digging a bit more in detail to see 
if there are any cases for which we can convert region state to CLOSED rather 
than ABNORMALLY_CLOSED and therefore avoid any operator intervention, but i 
fear we might introduce double assignment of regions if this is not done 
carefully.

> Introduce a region transition procedure to handle all the state transition 
> for a region
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-20881
>                 URL: https://issues.apache.org/jira/browse/HBASE-20881
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2, proc-v2
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.2.0
>
>         Attachments: HBASE-20881-branch-2-v1.patch, 
> HBASE-20881-branch-2-v2.patch, HBASE-20881-branch-2.patch, 
> HBASE-20881-v1.patch, HBASE-20881-v10.patch, HBASE-20881-v11.patch, 
> HBASE-20881-v12.patch, HBASE-20881-v13.patch, HBASE-20881-v13.patch, 
> HBASE-20881-v14.patch, HBASE-20881-v14.patch, HBASE-20881-v15.patch, 
> HBASE-20881-v16.patch, HBASE-20881-v2.patch, HBASE-20881-v3.patch, 
> HBASE-20881-v4.patch, HBASE-20881-v4.patch, HBASE-20881-v5.patch, 
> HBASE-20881-v6.patch, HBASE-20881-v7.patch, HBASE-20881-v7.patch, 
> HBASE-20881-v8.patch, HBASE-20881-v9.patch, HBASE-20881.patch
>
>
> Now have an AssignProcedure, an UnssignProcedure, and also a 
> MoveRegionProcedure which schedules an AssignProcedure and an 
> UnssignProcedure to move a region. This makes the logic a bit complicated, as 
> MRP is not a RIT, so when SCP can not interrupt it directly...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to