Amitanand Aiyer created HBASE-8281:
--------------------------------------

             Summary: Unassigned regions: dropped messages from Master to RS
                 Key: HBASE-8281
                 URL: https://issues.apache.org/jira/browse/HBASE-8281
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.89-fb
            Reporter: Amitanand Aiyer


We have seen a couple of scenarios where transcient network issue between the 
RS and Master results in regions being unassigned (and staying unassigned) 
until someone intervenes manually with hbck -fix.

The events occur as follows. 

RS checks in for a regionServerReport.
  Master wants to assign a region to the RS. Hence adds a MSG_REGION_OPEN msg 
to the return results, and marks the region as PENDING_OPEN.

  The messages from the master to the RS is not delivered due to network error.

Network heals, and the RS is able to do regionServerReports in future; it is in 
good standing with the master. But, RS does not know that it has to open the 
region. Master thinks that the RS is going to open the region.

Region remains unassigned until we intervene with hbck.


Possible fix:
  I think it may be a mistake to unilaterally change the RegionState to 
pendingOpen once the master decides that it wants to send the message. Perhaps, 
we should create an intermediate state, where the master will keep sending the 
OPEN message to the RS until it acks. And, update the RegionState to 
PendingOpen only after the RS has acked.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to