[ https://issues.apache.org/jira/browse/HBASE-25032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348004#comment-17348004 ]
Duo Zhang commented on HBASE-25032: ----------------------------------- Was wondering whether we need to commit this to branch-2.4 and branch-2.3, this is a big behavior change, which should be not defined as a simple bug. > Do not assign regions to region server which has not called > regionServerReport yet > ---------------------------------------------------------------------------------- > > Key: HBASE-25032 > URL: https://issues.apache.org/jira/browse/HBASE-25032 > Project: HBase > Issue Type: Bug > Reporter: Sandeep Guggilam > Assignee: Duo Zhang > Priority: Major > Labels: master, regionserver > Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6 > > > As part of RS start up, RS reports for duty to Master . Master acknowledges > the request and adds it to the onlineServers list for further assigning any > regions to the RS > Once Master acknowledges the reportForDuty and sends back the response, RS > does a bunch of stuff like initializing replication sources etc before > becoming online. However, sometimes there could be an issue with initializing > replication sources when it is unable to connect to peer clusters because of > some kerberos configuration and there would be a delay of around 20 mins in > becoming online. > > Since master considers it online, it tries to assign regions and which fails > with ServerNotRunningYet exception, then the master tries to unassign which > again fails with the same exception leading the region to FAILED_CLOSE state. > > It would be good to have a check to see if the RS is ready to accept the > assignment requests before adding it to online servers list which would > account for any such delays as described above -- This message was sent by Atlassian Jira (v8.3.4#803005)