[ https://issues.apache.org/jira/browse/HBASE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107299#comment-13107299 ]
Hudson commented on HBASE-4400: ------------------------------- Integrated in HBase-TRUNK #2228 (See [https://builds.apache.org/job/HBase-TRUNK/2228/]) HBASE-4400 fixed up the anonymous Abortable in createAndForceNodeToOpenedState() HBASE-4400 rename metaRegion to region in HBaseTestingUtility.createAndForceNodeToOpenedState() HBASE-4400 .META. getting stuck if RS hosting it is dead and znode state is in RS_ZK_REGION_OPENED (Ramkrishna) tedyu : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java tedyu : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java > .META. getting stuck if RS hosting it is dead and znode state is in > RS_ZK_REGION_OPENED > --------------------------------------------------------------------------------------- > > Key: HBASE-4400 > URL: https://issues.apache.org/jira/browse/HBASE-4400 > Project: HBase > Issue Type: Bug > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.0, 0.90.5 > > Attachments: HBASE-4400_0.90.patch, HBASE-4400_0.90_1.patch, > HBASE-4400_trunk.patch, HBASE-4400_trunk_1.patch > > > Start 2 RS. > The .META. is being hosted by RS2 but while processing it goes down. > Now restart the master and RS1. Master gets the RS name from the znode in > RS_ZK_REGION_OPENED. But as RS2 is not online still the master is not able > to process the META at all. Please find the logs > {noformat} > 2011-09-14 16:43:51,949 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, server=linux76,60020,1315998828523, > region=70236052/-ROOT- > 2011-09-14 16:43:51,968 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- > assigned=1, rit=false, location=linux76:60020 > 2011-09-14 16:43:51,970 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Processing region > .META.,,1.1028785192 in state RS_ZK_REGION_OPENED > 2011-09-14 16:43:51,970 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Failed to find > linux146,60020,1315998414623 in list of online servers; skipping registration > of open of .META.,,1.1028785192 > 2011-09-14 16:43:51,971 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Waiting on 1028785192/.META. > 2011-09-14 16:43:51,983 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENED, server=linux76,60020,1315998828523, > region=70236052/-ROOT- > 2011-09-14 16:43:51,986 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED > event for 70236052; deleting unassigned node > 2011-09-14 16:43:51,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13267854032001d Deleting existing unassigned node for 70236052 > that is in expected state RS_ZK_REGION_OPENED > 2011-09-14 16:43:51,998 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13267854032001d Successfully deleted unassigned node for > region 70236052 in expected state RS_ZK_REGION_OPENED > 2011-09-14 16:43:51,999 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region > -ROOT-,,0.70236052 on linux76,60020,1315998828523 > 2011-09-14 16:44:00,945 INFO org.apache.hadoop.hbase.master.ServerManager: > Registering server=linux146,60020,1315998839724, regionCount=0, userLoad=false > 2011-09-14 16:46:20,003 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed > out: .META.,,1.1028785192 state=OPEN, ts=0 > 2011-09-14 16:46:20,004 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > {noformat} > {code} > regionsInTransition.put(encodedRegionName, new RegionState( > regionInfo, RegionState.State.OPEN, data.getStamp())); > ................ > } else { > HServerInfo hsi = this.serverManager.getServerInfo(sn); > if (hsi == null) { > LOG.info("Failed to find " + sn + > " in list of online servers; skipping registration of open of " > + > regionInfo.getRegionNameAsString()); > } else { > new OpenedRegionHandler(master, this, regionInfo, hsi).process(); > } > } > {code} > So timeout monitor is not able to do anything here > {code} > LOG.error("Region has been OPEN for too long, " + > "we don't know where region was opened so can't do anything"); > synchronized(regionState) { > regionState.update(regionState.getState()); > } > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira