[ 
https://issues.apache.org/jira/browse/HBASE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843448#comment-13843448
 ] 

Jeffrey Zhong commented on HBASE-10101:
---------------------------------------

The ZK clean is only clear the master address node and RS nodes which should be 
removed when a cluster is shut down. The added steps make sure we have a clean 
restart for normal unit tests and there are special cases for master(cluster) 
restart scenarios. 

I prefer the test case in TestAssignmentManagerOnCluster because it's about 
region aren't be assigned during a cluster restart.

Below are my comments on the trunk patch:

{code}
+      regionStates.setLastRegionServerOfRegion(sn, encodedName);
+      if (regionInfo.isMetaRegion()) {
+        // If it's meta region, reset the meta location.
+        // So that master knows the right meta region server.
+        MetaRegionTracker.setMetaLocation(watcher, sn);
+      }
{code}
The above is a little dramatic because we just set internal Memory state to 
some server. This'll cause confusion for the future readers.

{code}
-          if (expireIfOnline(currentMetaServer)) {
+          if (!serverManager.isServerDead(currentMetaServer)) {
{code}
This isn't ideal because we could have a race condition that a dead meta server 
may not report(SessionException) in time. We could skip meta re-assign and 
cause master can't be started.

[~jxiang] For your latest patch, it looks good to me except the changes in 
HMaster.java. I'd prefer my v3-update patch unless you have a strong feeling 
about your trunk patch. 

I'll let you decide which to choose and move on. Thanks.


> testOfflineRegionReAssginedAfterMasterRestart times out sometimes.
> ------------------------------------------------------------------
>
>                 Key: HBASE-10101
>                 URL: https://issues.apache.org/jira/browse/HBASE-10101
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jimmy Xiang
>            Assignee: Jeffrey Zhong
>         Attachments: hbase-10101-v2.patch, hbase-10101-v3-update.patch, 
> hbase-10101-v3.patch, hbase-10101.patch, test.log, trunk-10101.patch, 
> trunk-10101_v2.patch
>
>
> Sometimes, I got this test timed out. The log is attached. It could be 
> because the new cluster takes a while to process the dead server, or assign 
> meta.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to