[ https://issues.apache.org/jira/browse/HBASE-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15028512#comment-15028512 ]
Abhishek Singh Chouhan commented on HBASE-14889: ------------------------------------------------ [~pankaj2461] Not at the moment. Feel free to take this up if you want. > Region stuck in transition in OPEN state indefinitely in corner scenario > ------------------------------------------------------------------------ > > Key: HBASE-14889 > URL: https://issues.apache.org/jira/browse/HBASE-14889 > Project: HBase > Issue Type: Bug > Affects Versions: 0.98.14 > Reporter: Abhishek Singh Chouhan > > During a failure scenario when a RS dies and the bulk assigner(BA) is > assigning its regions to others RSs, if another RS dies(on which some regions > are being moved) on which region is in pending open state, we end up in a > situation where two bulk assigners try to assign the same region on the Same > RS. > The following happened - > 1. While one BA was opening the region the second one sees it in pending open > state, retries and calls unassign(...) thereby sending CLOSE RPC to the RS. > 2. The RS meanwhile has already opened the region, hence changing the znode > state to RS_ZK_REGION_OPENED which triggers event on master. > 3. On master after the unassign is successful we go on to deleting the znode, > change region state to Pending open and send open RPC to RS. > 4. The earlier triggered event now sees the state as Pending open and happily > changes it to OPEN, but is unable to delete the znode which by this time is > not in RS_ZK_REGION_OPENED state but is in M_ZK_REGION_OFFLINE state. Hence > the region remains in transition in the OPEN state. > 5. RS goes on to changing the znode states and successfully opens the region > (changes znode state to RS_ZK_REGION_OPENED) > 6. This again triggers event on master but this time since the state is OPEN > the folloing code path is taken > {noformat} > case RS_ZK_REGION_OPENED: > // Should see OPENED after OPENING but possible after PENDING_OPEN. > if (regionState == null > || !regionState.isPendingOpenOrOpeningOnServer(sn)) { > LOG.warn("Received OPENED for " + prettyPrintedRegionName > + " from " + sn + " but the region isn't PENDING_OPEN/OPENING > here: " > + regionStates.getRegionState(encodedName)); > if (regionState != null) { > // Close it without updating the internal region states, > // so as not to create double assignments in unlucky scenarios > // mentioned in OpenRegionHandler#process > unassign(regionState.getRegion(), null, -1, null, false, sn); > } > return; > } > {noformat} > We call unassign here with transitionInZK=false and state=null > 7. RS closes the region but doesn't update the ZK, also state is not changed > in master. Region remains in transition in OPEN state, when its actually > closed. We have to restart the RS post which it opens correctly on some other > RS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)