[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287235#comment-13287235
 ] 

Hudson commented on HBASE-6122:
-------------------------------

Integrated in HBase-0.92-security #109 (See 
[https://builds.apache.org/job/HBase-0.92-security/109/])
    HBASE-6122 Backup master does not become Active master after ZK exception 
(Ram) (Revision 1344799)
HBASE-6122 Backup master does not become Active master after ZK exception: 
REVERT (Revision 1344466)
HBASE-6122 Backup master does not become Active master after ZK exception (Ram) 
(Revision 1344350)

     Result = SUCCESS
ramkrishna : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java

stack : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java

ramkrishna : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java

                
> Backup master does not become Active master after ZK exception
> --------------------------------------------------------------
>
>                 Key: HBASE-6122
>                 URL: https://issues.apache.org/jira/browse/HBASE-6122
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.92.2, 0.94.1
>
>         Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
> HBASE-6122_0.94.patch, HBASE-6122_0.94.patch
>
>
> -> Active master gets ZK expiry exception.
> -> Backup master becomes active.
> -> The previous active master retries and becomes the back up master.
> Now when the new active master goes down and the current back up master comes 
> up, it goes down again with the zk expiry exception it got in the first step.
> {code}
> if (abortNow(msg, t)) {
>       if (t != null) LOG.fatal(msg, t);
>       else LOG.fatal(msg);
>       this.abort = true;
>       stop("Aborting");
>     }
> {code}
> In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
> back up master becomes active. 
> {code}
>     synchronized (this.clusterHasActiveMaster) {
>       while (this.clusterHasActiveMaster.get() && !this.master.isStopped()) {
>         try {
>           this.clusterHasActiveMaster.wait();
>         } catch (InterruptedException e) {
>           // We expect to be interrupted when a master dies, will fall out if 
> so
>           LOG.debug("Interrupted waiting for master to die", e);
>         }
>       }
>       if (!clusterStatusTracker.isClusterUp()) {
>         this.master.stop("Cluster went down before this master became 
> active");
>       }
>       if (this.master.isStopped()) {
>         return cleanSetOfActiveMaster;
>       }
>       // Try to become active master again now that there is no active master
>       blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
>     }
>     return cleanSetOfActiveMaster;
> {code}
> When the back up master (it is in back up mode as he got ZK exception), once 
> again tries to come to active we don't get the return value that comes out 
> from 
> {code}
> // Try to become active master again now that there is no active master
>       blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
> {code}
> We tend to return the 'cleanSetOfActiveMaster' which was previously false.
> Now because of this instead of again becoming active the back up master goes 
> down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
> issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to