[jira] [Commented] (HBASE-3937) Region PENDING-OPEN timeout with un-expected ZK node state leads to an endless loop

Jieshan Bean (JIRA) Tue, 31 May 2011 19:36:35 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041952#comment-13041952
 ]


Jieshan Bean commented on HBASE-3937:
-------------------------------------

How about just modify the case of PENDING_OPEN as following? Or just modified 
the assign method suggested by J-D? 
{noformat}
 case PENDING_OPEN:
                LOG.info("Region has been PENDING_OPEN for too " +
                    "long, reassigning region=" +
                    regionInfo.getRegionNameAsString());
                
                // when is the ZK of state OPENING or others,Change into OFFLINE
                String pendingNode = ZKAssign.getNodeName(watcher,
                    regionInfo.getEncodedName());
                Stat pendingStat = new Stat();
                try {
                  RegionTransitionData pendingData = ZKAssign.getDataNoWatch(
                      watcher, pendingNode, pendingStat);
                  if ((null != pendingData)
                      && (pendingData.getEventType() != 
EventType.M_ZK_REGION_OFFLINE)) {
                    pendingData = new RegionTransitionData(
                        EventType.M_ZK_REGION_OFFLINE,
                        regionInfo.getRegionName(), master.getServerName());
                    if (ZKUtil.setData(watcher, pendingNode,
                        pendingData.getBytes(), pendingStat.getVersion())) {
                      // Node is now OFFLINE, let's trigger another assignment
                      ZKUtil.getDataAndWatch(watcher, pendingNode);
                      LOG.info("Successfully transitioned region="
                          + regionInfo.getRegionNameAsString() + " from "
                          + pendingData.getEventType()
                          + " to OFFLINE and forcing a new assignment.");
                    }
                  }
                } catch (KeeperException ke) {
                  LOG.error("ZK KeeperException timing out CLOSING region", ke);
                }
                
                assigns.put(regionState.getRegion(), Boolean.TRUE);
                break;
{noformat}

> Region PENDING-OPEN timeout with un-expected ZK node state leads to an 
> endless loop
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-3937
>                 URL: https://issues.apache.org/jira/browse/HBASE-3937
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.4
>
>
> I describe the scenario of how this problem happened:
> 1.HMaster assigned the region A to RS1. So the RegionState was set to 
> PENDING_OPEN.
> 2.For there's too many opening requests, the open process on RS1 was blocked.
> 3.Some time later, TimeoutMonitor found the assigning of A was timeout. For 
> the RegionState was in PENDING_OPEN, went into the following handler 
> process(Just put the region into an waiting-assigning set):
>    case PENDING_OPEN:
>       LOG.info("Region has been PENDING_OPEN for too " +
>           "long, reassigning region=" +
>           regionInfo.getRegionNameAsString());
>       assigns.put(regionState.getRegion(), Boolean.TRUE);
>       break;
> So we can see that, under this case, we consider the ZK node state was 
> OFFLINE. Indeed, in an normal disposal, it's OK.
> 4.But before the real-assigning, the requests of RS1 was disposed. So that 
> affected the new-assigning. For it update the ZK node state from OFFLINE to 
> OPENING. 
> 5.The new assigning started, so it send region to open in RS2. But while the 
> opening, it should update the ZK node state from OFFLINE to OPENING. For the 
> current state is OPENING, so this operation failed.
> So this region couldn't be open success anymore.
> So I think, to void this problem , under the case of PENDING_OPEN of 
> TiemoutMonitor, we should transform the ZK node state to OFFLINE first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3937) Region PENDING-OPEN timeout with un-expected ZK node state leads to an endless loop

Reply via email to