Sandeep Guggilam created HBASE-25032:
----------------------------------------

             Summary: Wait for region server to become online before 
considering it online in Master
                 Key: HBASE-25032
                 URL: https://issues.apache.org/jira/browse/HBASE-25032
             Project: HBase
          Issue Type: Bug
            Reporter: Sandeep Guggilam
            Assignee: Sandeep Guggilam


As part of RS start up, RS reports for duty to Master . Master acknowledges the 
request and adds it to the onlineServers list for further assigning any regions 
to the RS

Once Master acknowledges the reportForDuty and sends back the response, RS does 
a bunch of stuff like initializing replication sources etc before becoming 
online. However, sometimes there could be an issue with initializing 
replication sources when it is unable to connect to peer clusters because of 
some kerberos configuration and there would be a delay of around 20 mins in 
becoming online.

 

Since master considers it online, it tries to assign regions and which fails 
with ServerNotRunningYet exception, then the master tries to unassign which 
again fails with the same exception leading the region to FAILED_CLOSE state.

 

It would be good to have a check to see if the RS is ready to accept the 
assignment requests before adding it to online servers list which would account 
for any such delays as described above



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to