[ https://issues.apache.org/jira/browse/HBASE-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284014#comment-13284014 ]
Zhihong Yu edited comment on HBASE-6088 at 5/26/12 4:29 PM: ------------------------------------------------------------ {code} - private static int createNodeSplitting(final ZooKeeperWatcher zkw, - final HRegionInfo region, final ServerName serverName) - throws KeeperException, IOException { + int createNodeSplitting(final ZooKeeperWatcher zkw, final HRegionInfo region, ... + int transitionNodeSplitting(final ZooKeeperWatcher zkw, final HRegionInfo parent, {code} The above two methods can remain static, right ? For transitionNodeSplitting(), please finish the following javadoc: {code} + * @return {code} For the test, {code} + * Before setting region in splitting transition if znode may be created and received some + * exception then znode may be present and splitting may not happen,this is to test whether znode {code} Please rewrite the first sentence above. {code} + public void testSplitBeforeSplittingRegionInTransition() throws IOException, {code} Rename the above method testSplitBeforeSettingRegionInSplittingTransition() ? {code} + public static class MockedSplitTransaction extends SplitTransaction{ {code} nit: add a space before { was (Author: zhi...@ebaysf.com): {code} - private static int createNodeSplitting(final ZooKeeperWatcher zkw, - final HRegionInfo region, final ServerName serverName) - throws KeeperException, IOException { + int createNodeSplitting(final ZooKeeperWatcher zkw, final HRegionInfo region, ... + int transitionNodeSplitting(final ZooKeeperWatcher zkw, final HRegionInfo parent, {code} The above two methods can remain static, right ? For transitionNodeSplitting(), please finish the following javadoc: {code} + * @return {code} > Region splitting not happened for long time due to ZK exception while > creating RS_ZK_SPLITTING node > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-6088 > URL: https://issues.apache.org/jira/browse/HBASE-6088 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.0 > Reporter: Gopinathan A > Assignee: rajeshbabu > Fix For: 0.96.0, 0.94.1 > > Attachments: HBASE-6088_94.patch, HBASE-6088_trunk.patch, > HBASE-6088_trunk_2.patch > > > Region splitting not happened for long time due to ZK exception while > creating RS_ZK_SPLITTING node > {noformat} > 2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session > timed out, have not heard from server in 26668ms for sessionid > 0x1377a75f41d0012, closing socket connection and attempting reconnect > 2012-05-24 01:45:41,464 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144 > {noformat} > {noformat} > 2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: > cleanupCurrentWriter waiting for transactions to get synced total 189377 > synced till here 189365 > 2012-05-24 01:45:48,474 INFO > org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup > of failed split of > ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed > setting SPLITTING znode on > ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. > java.io.IOException: Failed setting SPLITTING znode on > ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. > at > org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242) > at > org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.apache.zookeeper.KeeperException$BadVersionException: > KeeperErrorCode = BadVersion for > /hbase/unassigned/bd1079bf948c672e493432020dc0e144 > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659) > at > org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811) > at > org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747) > at > org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919) > at > org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869) > at > org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239) > ... 5 more > 2012-05-24 01:45:48,476 INFO > org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of > failed split of > ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. > {noformat} > {noformat} > 2012-05-24 01:47:28,141 ERROR > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node > /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is > not a retry > 2012-05-24 01:47:28,142 INFO > org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup > of failed split of > ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed > create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144 > java.io.IOException: Failed create of ephemeral > /hbase/unassigned/bd1079bf948c672e493432020dc0e144 > at > org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865) > at > org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239) > at > org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) > {noformat} > Due to the above exception, region splitting was failing contineously more > than 5hrs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira