[ 
https://issues.apache.org/jira/browse/HBASE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109472#comment-16109472
 ] 

Umesh Agashe commented on HBASE-18366:
--------------------------------------

bq. Why this change sir: optional ServerName destination_server = 3;

destination_server when not specified, LoadBalancer will select it. It can be 
used when region is required to be moved from RS but target server will be 
selected by load balancer.

[~stack] and I had discussion on this JIRA, unit test TestServerCrashProcedure 
and patched uploaded here. We identified following areas for improving code/ 
fixing bugs:

* Currently UnassignProcedure returns success when server carrying a region is 
not online. Assumption here is that ServerCrashProcedure will handle splitting 
logs etc for these regions. When UnassignProcedure completes, 
MoveRegionProcedure resumes with AssignProcedure. AssignProcedure can assign 
region without pre-requisite steps. Fix is to fail UnassignProcedure and parent 
MoveRegionProcedure if source server is not online.
* Embed logic of selecting highest versioned region server for system table 
regions in AssignmentManager.processAssignQueue(). This way from any section of 
the code system table regions are re/assigned, only highest versioned RS are 
considered for target servers.
* As ServerCrashProcedure handles reassignment of regions on a crashed server, 
don't process those regions on crashed server through call to 
AssignmentManager.checkIfShouldMoveSystemRegionAsync()
* Modify LoadBalancer implementation to consider highest versioned Region 
Servers as favorites for system table regions.
* Look into ServerManager refactoring to make isServerOnline() and 
isServerDead() mutually exclusive

All these issues are related to AMv2, I will create a JIRAs to track these 
issues.

Thanks, Umesh


> Fix flaky test 
> hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-18366
>                 URL: https://issues.apache.org/jira/browse/HBASE-18366
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Umesh Agashe
>            Assignee: Umesh Agashe
>            Priority: Blocker
>             Fix For: 2.0.0
>
>         Attachments: hbase-18366.fix1.patch, hbase-18366.fix2.patch
>
>
> It worked for a few days after enabling it with HBASE-18278. But started 
> failing after commits:
> 6786b2b
> 68436c9
> 75d2eca
> 50bb045
> df93c13
> It works with one commit before: c5abb6c. Need to see what changed with those 
> commits.
> Currently it fails with TableNotFoundException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to