Race between online altering and splitting kills the master
-----------------------------------------------------------

                 Key: HBASE-4729
                 URL: https://issues.apache.org/jira/browse/HBASE-4729
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.92.0
            Reporter: Jean-Daniel Cryans
             Fix For: 0.92.0, 0.94.0


I was running an online alter while regions were splitting, and suddenly the 
master died and left my table half-altered (haven't restarted the master yet).

What killed the master:

{quote}
2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
Unexpected ZK exception creating node CLOSING
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
        at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
        at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
        at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
        at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
        at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{quote}

A znode was created because the region server was splitting the region 4 
seconds before:

{quote}
2011-11-02 17:06:40,704 INFO 
org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region 
TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
2011-11-02 17:06:40,704 DEBUG 
org.apache.hadoop.hbase.regionserver.SplitTransaction: 
regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:62023-0x132f043bbde0710 Attempting to transition node 
f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
RS_ZK_REGION_SPLITTING
...
2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
RS_ZK_REGION_SPLIT
2011-11-02 17:06:44,061 INFO 
org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
master to process the split for f7e1783e65ea8d621a4bc96ad310f101
{quote}

Now that the master is dead the region server is spewing those last two lines 
like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to