[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142378#comment-13142378 ]
Jean-Daniel Cryans commented on HBASE-4729: ------------------------------------------- After the master got restarted the split was processed correctly but the table is still half-altered (need to see if this is going to work). > Race between online altering and splitting kills the master > ----------------------------------------------------------- > > Key: HBASE-4729 > URL: https://issues.apache.org/jira/browse/HBASE-4729 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.0 > Reporter: Jean-Daniel Cryans > Fix For: 0.92.0, 0.94.0 > > > I was running an online alter while regions were splitting, and suddenly the > master died and left my table half-altered (haven't restarted the master yet). > What killed the master: > {quote} > 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: > Unexpected ZK exception creating node CLOSING > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:110) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) > at > org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) > at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {quote} > A znode was created because the region server was splitting the region 4 > seconds before: > {quote} > 2011-11-02 17:06:40,704 INFO > org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of > region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. > 2011-11-02 17:06:40,704 DEBUG > org.apache.hadoop.hbase.regionserver.SplitTransaction: > regionserver:62023-0x132f043bbde0710 Creating ephemeral node for > f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state > 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:62023-0x132f043bbde0710 Attempting to transition node > f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to > RS_ZK_REGION_SPLITTING > ... > 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:62023-0x132f043bbde0710 Successfully transitioned node > f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to > RS_ZK_REGION_SPLIT > 2011-11-02 17:06:44,061 INFO > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the > master to process the split for f7e1783e65ea8d621a4bc96ad310f101 > {quote} > Now that the master is dead the region server is spewing those last two lines > like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira