[ https://issues.apache.org/jira/browse/HBASE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843448#comment-13843448 ]
Jeffrey Zhong commented on HBASE-10101: --------------------------------------- The ZK clean is only clear the master address node and RS nodes which should be removed when a cluster is shut down. The added steps make sure we have a clean restart for normal unit tests and there are special cases for master(cluster) restart scenarios. I prefer the test case in TestAssignmentManagerOnCluster because it's about region aren't be assigned during a cluster restart. Below are my comments on the trunk patch: {code} + regionStates.setLastRegionServerOfRegion(sn, encodedName); + if (regionInfo.isMetaRegion()) { + // If it's meta region, reset the meta location. + // So that master knows the right meta region server. + MetaRegionTracker.setMetaLocation(watcher, sn); + } {code} The above is a little dramatic because we just set internal Memory state to some server. This'll cause confusion for the future readers. {code} - if (expireIfOnline(currentMetaServer)) { + if (!serverManager.isServerDead(currentMetaServer)) { {code} This isn't ideal because we could have a race condition that a dead meta server may not report(SessionException) in time. We could skip meta re-assign and cause master can't be started. [~jxiang] For your latest patch, it looks good to me except the changes in HMaster.java. I'd prefer my v3-update patch unless you have a strong feeling about your trunk patch. I'll let you decide which to choose and move on. Thanks. > testOfflineRegionReAssginedAfterMasterRestart times out sometimes. > ------------------------------------------------------------------ > > Key: HBASE-10101 > URL: https://issues.apache.org/jira/browse/HBASE-10101 > Project: HBase > Issue Type: Bug > Reporter: Jimmy Xiang > Assignee: Jeffrey Zhong > Attachments: hbase-10101-v2.patch, hbase-10101-v3-update.patch, > hbase-10101-v3.patch, hbase-10101.patch, test.log, trunk-10101.patch, > trunk-10101_v2.patch > > > Sometimes, I got this test timed out. The log is attached. It could be > because the new cluster takes a while to process the dead server, or assign > meta. -- This message was sent by Atlassian JIRA (v6.1.4#6159)