[ https://issues.apache.org/jira/browse/HBASE-13935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592351#comment-14592351 ]
Matteo Bertozzi commented on HBASE-13935: ----------------------------------------- {quote}If we have an orphaned ENABLING znode, before HMaster#initNamespace() was called, "this.assignmentManager.joinCluster();" was executed, which would call "AssignmentManager#recoverTableInEnablingState()" to remove the ENABLING znode. That is why my unit test only set to ENABLED and my guess is the orphaned znode in the test probably has ENABLED znode.{quote} the znode state is fine, what I don't know (sorry I haven't look at the code yet) is what happens if we keep going and we have already some state on disk. i know that if are in the same situation of the unit test everything is fine, but is that a real situation? can we end up with some data in the dir abort, restart the master skip the znode check and now adding another region in the systable which will case hbck to complain? if the above is not possible, patch is good. in proc-v2 the AM recoverTableInState() is still there, and it does the wrong thing for us. I think there is a jira to remove that. > Orphaned namespace table ZK node should not prevent master to start > ------------------------------------------------------------------- > > Key: HBASE-13935 > URL: https://issues.apache.org/jira/browse/HBASE-13935 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 1.0.0, 0.98.13 > Reporter: Stephen Yuan Jiang > Assignee: Stephen Yuan Jiang > Fix For: 0.98.14, 1.0.2 > > Attachments: HBASE-13935.v1-0.98.patch, > HBASE-13935.v1-branch-1.0.patch > > > Before we have the state-of-art Procedure V2 feature (HBASE 1.0 release or > older), we frequently see the following issue (orphaned ZK node) that prevent > master to start (at least in testing): > {noformat} > 2015-06-16 17:54:36,472 FATAL [master:10.0.0.99:60000] master.HMaster: > Unhandled exception. Starting shutdown. > org.apache.hadoop.hbase.TableExistsException: hbase:namespace > at > org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:137) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:232) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86) > at > org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1123) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:947) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:618) > at java.lang.Thread.run(Thread.java:745) > 2015-06-16 17:54:36,472 INFO [master:10.0.0.99:60000] master.HMaster: > Aborting > {noformat} > The above call trace is from a 0.98.x test run. We saw similar issue in > 1.0.x run, too. > The proposed fix is to ignore the zk node and force namespace table creation > to be complete so that master can start successfully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)