[ https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116059#comment-13116059 ]
Hudson commented on HBASE-4455: ------------------------------- Integrated in HBase-0.92 #24 (See [https://builds.apache.org/job/HBase-0.92/24/]) HBASE-4455 Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager (Ming Ma) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/MasterAddressTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java > Rolling restart RSs scenario, -ROOT-, .META. regions are lost in > AssignmentManager > ---------------------------------------------------------------------------------- > > Key: HBASE-4455 > URL: https://issues.apache.org/jira/browse/HBASE-4455 > Project: HBase > Issue Type: Bug > Reporter: Ming Ma > Assignee: Ming Ma > Fix For: 0.92.0 > > > Keep Master up all the time, do rolling restart of RSs like this - stop RS1, > wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start > RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. > regions aren't in "regions in transtion" from AssignmentManager point of > view, but they aren't assigned to any regions. Here are the issues. > 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is > invoked to check if it contains -ROOT- region. That is due to long delay from > ZK notification and async nature of the system. Here is an example, even > though new root region server sea-lab-1,60020,1316380133656 is set at T2, at > T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location > still points to old server sea-lab-3,60020,1316380037898. > T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > master:6 > 0000-0x1327e43175e0000 Retrieved 29 byte(s) of data from znode > /hbase/root-regio > n-server and set watcher; sea-lab-3,60020,1316380037898 > T2: 2011-09-18 14:08:57,173 INFO > org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region > location in ZooKeeper as sea-lab-1,60020,1316380133656 > T3: 2011-09-18 14:10:26,393 DEBUG > org.apache.hadoop.hbase.master.ServerManager: Adde > d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler > to be executed, root=false, meta=true, current Root Location: > sea-lab-3,60020,1316380037898 > T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > master:6 > 0000-0x1327e43175e0000 Retrieved 29 byte(s) of data from znode > /hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656 > 2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or > .META. availability could be blocked. If meanwhile, the new server that > -ROOT- or .META. is being assigned restarted, another instance of > MetaServerShutdownHandler is queued. Eventually, all > MetaServerShutdownHandler worker threads are filled up. It looks like > HBASE-4245. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira