[ https://issues.apache.org/jira/browse/HBASE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109472#comment-16109472 ]
Umesh Agashe commented on HBASE-18366: -------------------------------------- bq. Why this change sir: optional ServerName destination_server = 3; destination_server when not specified, LoadBalancer will select it. It can be used when region is required to be moved from RS but target server will be selected by load balancer. [~stack] and I had discussion on this JIRA, unit test TestServerCrashProcedure and patched uploaded here. We identified following areas for improving code/ fixing bugs: * Currently UnassignProcedure returns success when server carrying a region is not online. Assumption here is that ServerCrashProcedure will handle splitting logs etc for these regions. When UnassignProcedure completes, MoveRegionProcedure resumes with AssignProcedure. AssignProcedure can assign region without pre-requisite steps. Fix is to fail UnassignProcedure and parent MoveRegionProcedure if source server is not online. * Embed logic of selecting highest versioned region server for system table regions in AssignmentManager.processAssignQueue(). This way from any section of the code system table regions are re/assigned, only highest versioned RS are considered for target servers. * As ServerCrashProcedure handles reassignment of regions on a crashed server, don't process those regions on crashed server through call to AssignmentManager.checkIfShouldMoveSystemRegionAsync() * Modify LoadBalancer implementation to consider highest versioned Region Servers as favorites for system table regions. * Look into ServerManager refactoring to make isServerOnline() and isServerDead() mutually exclusive All these issues are related to AMv2, I will create a JIRAs to track these issues. Thanks, Umesh > Fix flaky test > hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta > --------------------------------------------------------------------------------------------------------- > > Key: HBASE-18366 > URL: https://issues.apache.org/jira/browse/HBASE-18366 > Project: HBase > Issue Type: Bug > Reporter: Umesh Agashe > Assignee: Umesh Agashe > Priority: Blocker > Fix For: 2.0.0 > > Attachments: hbase-18366.fix1.patch, hbase-18366.fix2.patch > > > It worked for a few days after enabling it with HBASE-18278. But started > failing after commits: > 6786b2b > 68436c9 > 75d2eca > 50bb045 > df93c13 > It works with one commit before: c5abb6c. Need to see what changed with those > commits. > Currently it fails with TableNotFoundException. -- This message was sent by Atlassian JIRA (v6.4.14#64029)