[ https://issues.apache.org/jira/browse/HBASE-15058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082498#comment-15082498 ]
Heng Chen commented on HBASE-15058: ----------------------------------- LGTM. One question: Should we need to do the same thing to merge process? I notice the quota info updated when state is MERGE_REVERTED, i think it should be done when state is READY_TO_MERGE like split process, and it also need revert when failed. wdyt? > AssignmentManager should account for unsuccessful split correctly which > initially passes quota check > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-15058 > URL: https://issues.apache.org/jira/browse/HBASE-15058 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.0 > Reporter: Ted Yu > Assignee: Ted Yu > Fix For: 1.3.0, 1.2.1 > > Attachments: 15058-branch-1-v1.txt, 15058-branch-1-v2.txt > > > When region split doesn't pass quota check, we would see exception similar to > the following: > {code} > 2015-12-29 16:07:33,653 INFO [RS:0;10.21.128.189:57449-splits-1451434041585] > regionserver.SplitRequest(97): Running rollback/cleanup of failed split of > np2: > testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.; > Failed to get ok from master to split > np2:testRegionNormalizationSplitOnCluster, > zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20. > java.io.IOException: Failed to get ok from master to split > np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20. > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > {code} > However, region split may fail for subsequent SplitTransactionPhase's in > stepsBeforePONR(). > Currently in branch-1, the distinction among the following TransitionCode's > is not clear in AssignmentManager#onRegionTransition(): > {code} > case SPLIT_PONR: > case SPLIT: > case SPLIT_REVERTED: > errorMsg = > onRegionSplit(serverName, code, hri, > HRegionInfo.convert(transition.getRegionInfo(1)), > HRegionInfo.convert(transition.getRegionInfo(2))); > if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) { > try { > regionStateListener.onRegionSplitReverted(hri); > {code} > onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is > normally null (onRegionSplit returns null at the end). > This would result in onRegionSplitReverted() being called for cases of > SPLIT_PONR and SPLIT. > When region split fails, AssignmentManager#onRegionTransition() should > account for the failure properly so that quota bookkeeping is consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)