[ https://issues.apache.org/jira/browse/HBASE-13935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592308#comment-14592308 ]
Stephen Yuan Jiang commented on HBASE-13935: -------------------------------------------- [~mbertozzi], The failed server was gone. Before the patch, it would fail if table is either in ENABLING or ENABLED state: {code} if (!assignmentManager.getTableStateManager().setTableStateIfNotInStates(tableName, ZooKeeperProtos.Table.State.ENABLING, ZooKeeperProtos.Table.State.ENABLING, ZooKeeperProtos.Table.State.ENABLED)) { throw new TableExistsException(tableName); } {code} If we have an orphaned ENABLING znode, before HMaster#initNamespace() was called, "this.assignmentManager.joinCluster();" was executed, which would call "AssignmentManager#recoverTableInEnablingState()" to remove the ENABLING znode. That is why my unit test only set to ENABLED and my guess is the orphaned znode in the test probably has ENABLED znode. [~mbertozzi] I thought this would not be a problem with PV2; however, we hit this twice with PV2 enabled in branch-1.1 testing a couple of weeks ago (HBASE-13815 - originally I thought the rollback had some flaw, but carefully examined code and I think rollback is correct). I applied the same skip logic locally and we never see this problem again in branch-1.1 testing. > Orphaned namespace table ZK node should not prevent master to start > ------------------------------------------------------------------- > > Key: HBASE-13935 > URL: https://issues.apache.org/jira/browse/HBASE-13935 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 1.0.0, 0.98.13 > Reporter: Stephen Yuan Jiang > Assignee: Stephen Yuan Jiang > Fix For: 0.98.14, 1.0.2 > > Attachments: HBASE-13935.v1-0.98.patch, > HBASE-13935.v1-branch-1.0.patch > > > Before we have the state-of-art Procedure V2 feature (HBASE 1.0 release or > older), we frequently see the following issue (orphaned ZK node) that prevent > master to start (at least in testing): > {noformat} > 2015-06-16 17:54:36,472 FATAL [master:10.0.0.99:60000] master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.TableExistsException: hbase:namespace > at > org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:137) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:232) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86) > at > org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1123) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:947) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:618) > at java.lang.Thread.run(Thread.java:745) > 2015-06-16 17:54:36,472 INFO [master:10.0.0.99:60000] master.HMaster: > Aborting > {noformat} > The above call trace is from a 0.98.x test run. We saw similar issue in > 1.0.x run, too. > The proposed fix is to ignore the zk node and force namespace table creation > to be complete so that master can start successfully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)