[ 
https://issues.apache.org/jira/browse/HBASE-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264950#comment-13264950
 ] 

Chinna Rao Lalam commented on HBASE-5806:
-----------------------------------------


for #1 above, 
RegionServer is crashed at SplitTransaction.createDaughters(Server, 
RegionServerServices) in  while removing from online regions()
{code}
    if (!testing) {
      
services.removeFromOnlineRegions(this.parent.getRegionInfo().getEncodedName());
    }
{code}

Here where ever the regionserver is crashed the ephemeral node will be deleted 
and master will get the notification of nodeDeleted() where it will be cleared 
from RIT

But the ServerShutdownHandler executed first than the nodeDeleted() event for 
the region node.
You can see that from the below logs

{noformat}
2012-04-06 14:35:08,841 DEBUG 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Removed 
test,,1333702991530.cdfa837563e75ac5f4dc128680cc8da8. from list of regions to 
assign because in RIT; region state: SPLITTING

2012-04-06 14:35:12,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Ephemeral node deleted, regionserver crashed?, clearing from RIT; 
rs=test,,1333702991530.cdfa837563e75ac5f4dc128680cc8da8. state=SPLITTING, 
ts=1333703059260, server=HOST-10-18-40-25,60020,1333695183392
{noformat}

In this situation the below code populated that region

{code}
  List<RegionState> regionsInTransition =
        this.services.getAssignmentManager().
          processServerShutdown(this.serverName);
{code}

and it is in !rit.isClosing() && !rit.isPendingClose() so the region is deleted 
from the hris

{code}
      for (RegionState rit : regionsInTransition) {
        if (!rit.isClosing() && !rit.isPendingClose()) {
          LOG.debug("Removed " + rit.getRegion().getRegionNameAsString() +
          " from list of regions to assign because in RIT; region state: " +
          rit.getState());
          if (hris != null) hris.remove(rit.getRegion());
        }
      }
{code}
The fix in SSH addresses #1.
#2 came because of HBASE-5615.  However HBASE-5615 was reverted.
#3 comes when master restarts after sp1itting is done and before CJ has cleared 
the region from META. So while rebuilding the user region we ensure that the 
offlined parent region is not again taken into account.

#2 and #3 are together taken care in this patch such that the fix does solve 
both the problems.
                
> Handle split region related failures on master restart and RS restart
> ---------------------------------------------------------------------
>
>                 Key: HBASE-5806
>                 URL: https://issues.apache.org/jira/browse/HBASE-5806
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: Chinna Rao Lalam
>             Fix For: 0.92.2, 0.96.0, 0.94.1
>
>         Attachments: HBASE-5806.patch
>
>
> This issue is raised to solve issues that comes out of partial region split 
> happened and the region node in the ZK which is in RS_ZK_REGION_SPLITTING and 
> RS_ZK_REGION_SPLIT is not yet processed.
> This also tries to address HBASE-5615.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to