[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091329#comment-13091329
 ] 

Ted Yu commented on HBASE-4124:
---

@Jinchao:
Can you prepare patch for TRUNK as well ?

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, HBASE-4124_Branch90V3.patch, 
> HBASE-4124_Branch90V4.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090931#comment-13090931
 ] 

Ted Yu commented on HBASE-4124:
---

+1 on patch v4.
Minor comment:
A few lines such as the following are longer than 80 characters:
{code}
+(null == data.getServerName() || 
!serverManager.isServerOnline(data.getServerName( {
{code}

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, HBASE-4124_Branch90V3.patch, 
> HBASE-4124_Branch90V4.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090759#comment-13090759
 ] 

ramkrishna.s.vasudevan commented on HBASE-4124:
---

{bq}.sorry.step 3: startup master again .
This statement confused me a bit.
Thanks for your explanation. :)

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, HBASE-4124_Branch90V3.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090752#comment-13090752
 ] 

Ted Yu commented on HBASE-4124:
---

+1 on patch version 3.

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, HBASE-4124_Branch90V3.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-24 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090698#comment-13090698
 ] 

gaojinchao commented on HBASE-4124:
---

@ram
How come we have a dead RS if we dont kill the RS

gao: If you stop the cluster, The meta will handle the server information.

if the master is also killed how can the regions be assigned to some other RS 

gao: When master startup, it collects the regions on a same region server and 
 call sendRegionOpen(destination, regions).
 If the region is relatively large number, when region server opens the 
reigons needs a long time.
 when master crash, the new master may reopen the regions on another region 
server.
 

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, HBASE-4124_Branch90V3.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-24 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090677#comment-13090677
 ] 

gaojinchao commented on HBASE-4124:
---

@Ted 
I have run all the tests. Thanks for your work.

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, HBASE-4124_Branch90V3.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090330#comment-13090330
 ] 

Ted Yu commented on HBASE-4124:
---

All tests passed with patch v3 for 0.90 branch.

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, HBASE-4124_Branch90V3.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090298#comment-13090298
 ] 

ramkrishna.s.vasudevan commented on HBASE-4124:
---

@Gao
Correct me if am wrong.  I can understand the intention behind the logic. 
{code}
+  RegionTransitionData data = ZKAssign.getData(watcher, 
regionInfo.getEncodedName()); 
+  
+  //When zk node has been updated by a living server, we consider that 
this region server is handling it. 
+  //So we should skip it and process it in processRegionsInTransition.
+  if (data != null && data.getServerName() != null &&
+serverManager.isServerOnline(data.getServerName())){
+  LOG.info("The region " + regionInfo.getEncodedName() +
+"is processing by " + data.getServerName());
+continue;
+  }
{code}
But if as part of rebuildUserRegions() the master finds a server to be dead and 
adds those RS to dead servers and also u said the master was killed.
How come we have a dead RS if we dont kill the RS and if the master is also 
killed how can the regions be assigned to some other RS (how can the state 
change in ZK for that region node).
May be am not understanding something.  If you can explain this it will help me 
in Timeoutmonitor. 
Rest looks fine.  

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, HBASE-4124_Branch90V3.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090250#comment-13090250
 ] 

Ted Yu commented on HBASE-4124:
---

Once patch v3 receives +1 vote, a patch for TRUNK should be made.
Thanks for the effort.

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, HBASE-4124_Branch90V3.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-24 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090141#comment-13090141
 ] 

gaojinchao commented on HBASE-4124:
---

@Ted
Does it need a patch for Trunk? 
There is a big change, I need some time to study it.

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, HBASE-4124_Branch90V3.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089987#comment-13089987
 ] 

Ted Yu commented on HBASE-4124:
---

HBASE-4124_Branch90V2.patch makes sense.
Please correct grammar in javadocs.

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-23 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089890#comment-13089890
 ] 

gaojinchao commented on HBASE-4124:
---

RS isn't dead. I can reproduce and verify it.

ZK status has changed before adding to RIT set. You can look the function 
processDeadServers.
That is the reason why a region is assigned twice. 

// If region was in transition (was in zk) force it offline for reassign
try {
  //Process with existing RS shutdown code  
  boolean assign =
ServerShutdownHandler.processDeadRegion(regionInfo, result, this,
  this.catalogTracker);
  if (assign) {
ZKAssign.createOrForceNodeOffline(watcher, regionInfo,
  master.getServerName()); 
  }



> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089261#comment-13089261
 ] 

ramkrishna.s.vasudevan commented on HBASE-4124:
---

@Gao
{bq}
step 3: startup master again .

As per the scenario you have described when the master restarted the RS has it 
opened the region? I think the scenario here is RS is also dead.
If so the assignment manager will try assigning it to a new RS.  Do you think 
any problem here? 
If the RS is alive then the znode status will be OPENED state and the 
processRIT will take care of clearing the node as it is already opened.  Could 
be be more clear on the state of RS after you killed the master and also on the 
state of znode in zookeeper for that region.


> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
>Assignee: gaojinchao
> Fix For: 0.90.5
>
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-20 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088173#comment-13088173
 ] 

gaojinchao commented on HBASE-4124:
---

I have added a test case for opening a region.

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-19 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088147#comment-13088147
 ] 

gaojinchao commented on HBASE-4124:
---

sorry.step 3: startup master again .

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-19 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088146#comment-13088146
 ] 

gaojinchao commented on HBASE-4124:
---

I have finished the test. I discribe the scene:
step 1: startup cluster 
step 2: abort the master when finish call "sendRegionOpen(destination, regions)"
step 3: startup cluster again.

above steps will reproduce the issue. 
when master is failover. the meta records the dead server,but the region is 
processing for a living region server.


> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
> Attachments: HBASE-4124_Branch90V1_trial.patch, 
> HBASE-4124_Branch90V2.patch, log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-08-13 Thread fulin wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084780#comment-13084780
 ] 

fulin wang commented on HBASE-4124:
---

Please gaojinchao fix the issues, Thanks.

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
> Attachments: log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-07-25 Thread fulin wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070342#comment-13070342
 ] 

fulin wang commented on HBASE-4124:
---

I can't find where does it call getRegionsInTransitionInRS().add()? So I do not 
understand why add this function.
About 'already online on this server' of error, I want that the region should 
be closed or reassinged. I am trying to make a patch.

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
> Attachments: log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

2011-07-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070103#comment-13070103
 ] 

stack commented on HBASE-4124:
--

hbase-3741 changes the behavior here in that now we notice if we are asked to 
open a region that is already open and we'll throw an exception back to the 
master.  I think the master will now reassign it elsewhere which is not what we 
want if its a RegionAlreadyInTransitionException.  This will make it so we'll 
not keep retrying but I think there is more to do.

> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> 
>
> Key: HBASE-4124
> URL: https://issues.apache.org/jira/browse/HBASE-4124
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: fulin wang
> Attachments: log.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> ZK restarted while assigning a region, new active HM re-assign it but the RS 
> warned 'already online on this server'.
> Issue:
> The RS failed besause of 'already online on this server' and return; The HM 
> can not receive the message and report 'Regions in transition timed out'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira