[ 
https://issues.apache.org/jira/browse/HBASE-13935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592351#comment-14592351
 ] 

Matteo Bertozzi commented on HBASE-13935:
-----------------------------------------

{quote}If we have an orphaned ENABLING znode, before HMaster#initNamespace() 
was called, "this.assignmentManager.joinCluster();" was executed, which would 
call "AssignmentManager#recoverTableInEnablingState()" to remove the ENABLING 
znode. That is why my unit test only set to ENABLED and my guess is the 
orphaned znode in the test probably has ENABLED znode.{quote}
the znode state is fine, what I don't know (sorry I haven't look at the code 
yet) is what happens if we keep going and we have already some state on disk. i 
know that if are in the same situation of the unit test everything is fine, but 
is that a real situation? can we end up with some data in the dir abort, 
restart the master skip the znode check and now adding another region in the 
systable which will case hbck to complain? 
if the above is not possible, patch is good. 

in proc-v2 the AM recoverTableInState() is still there, and it does the wrong 
thing for us. I think there is a jira to remove that.

> Orphaned namespace table ZK node should not prevent master to start
> -------------------------------------------------------------------
>
>                 Key: HBASE-13935
>                 URL: https://issues.apache.org/jira/browse/HBASE-13935
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.0.0, 0.98.13
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>             Fix For: 0.98.14, 1.0.2
>
>         Attachments: HBASE-13935.v1-0.98.patch, 
> HBASE-13935.v1-branch-1.0.patch
>
>
> Before we have the state-of-art Procedure V2 feature (HBASE 1.0 release or 
> older), we frequently see the following issue (orphaned ZK node) that prevent 
> master to start (at least in testing):
> {noformat}
> 2015-06-16 17:54:36,472 FATAL [master:10.0.0.99:60000] master.HMaster: 
> Unhandled exception. Starting shutdown.
> org.apache.hadoop.hbase.TableExistsException: hbase:namespace
>       at 
> org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:137)
>       at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:232)
>       at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
>       at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1123)
>       at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:947)
>       at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:618)
>       at java.lang.Thread.run(Thread.java:745)
> 2015-06-16 17:54:36,472 INFO  [master:10.0.0.99:60000] master.HMaster: 
> Aborting
> {noformat}
> The above call trace is from a 0.98.x test run.  We saw similar issue in 
> 1.0.x run, too.  
> The proposed fix is to ignore the zk node and force namespace table creation 
> to be complete so that master can start successfully.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to