[jira] [Issue Comment Edited] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267392#comment-13267392
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-5875 at 5/3/12 12:44 PM:


bq.What is the above referring to? Which part of the code?

In assignRootAndMeta()
{code}
boolean rit = this.assignmentManager.
  
processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);

{code}
bq.Can the master not detect this corner case just by looking at whats in zk?
Here zk you mean the RS node or the ROOT region node?

  was (Author: ram_krish):
bq.What is the above referring to? Which part of the code?

In assignRootAndMeta()
{code}
boolean rit = this.assignmentManager.
  
processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);

{code}

  
> Process RIT and Master restart may remove an online server considering it as 
> a dead server
> --
>
> Key: HBASE-5875
> URL: https://issues.apache.org/jira/browse/HBASE-5875
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.1
>
> Attachments: HBASE-5875.patch
>
>
> If on master restart it finds the ROOT/META to be in RIT state, master tries 
> to assign the ROOT region through ProcessRIT.
> Master will trigger the assignment and next will try to verify the Root 
> Region Location.
> Root region location verification is done seeing if the RS has the region in 
> its online list.
> If the master triggered assignment has not yet been completed in RS then the 
> verify root region location will fail.
> Because it failed 
> {code}
> splitLogAndExpireIfOnline(currentRootServer);
> {code}
> we do split log and also remove the server from online server list. Ideally 
> here there is nothing to do in splitlog as no region server was restarted.
> So master, though the server is online, master just invalidates the region 
> server.
> In a special case, if i have only one RS then my cluster will become non 
> operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261814#comment-13261814
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-5875 at 4/25/12 5:13 PM:


Updated to 0.94.1.  
{Edit} I will come up with a patch in another couple of days. {Edit}

  was (Author: ram_krish):
Updated to 0.94.1.  
  
> Process RIT and Master restart may remove an online server considering it as 
> a dead server
> --
>
> Key: HBASE-5875
> URL: https://issues.apache.org/jira/browse/HBASE-5875
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.1
>
>
> If on master restart it finds the ROOT/META to be in RIT state, master tries 
> to assign the ROOT region through ProcessRIT.
> Master will trigger the assignment and next will try to verify the Root 
> Region Location.
> Root region location verification is done seeing if the RS has the region in 
> its online list.
> If the master triggered assignment has not yet been completed in RS then the 
> verify root region location will fail.
> Because it failed 
> {code}
> splitLogAndExpireIfOnline(currentRootServer);
> {code}
> we do split log and also remove the server from online server list. Ideally 
> here there is nothing to do in splitlog as no region server was restarted.
> So master, though the server is online, master just invalidates the region 
> server.
> In a special case, if i have only one RS then my cluster will become non 
> operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira