[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284961#comment-13284961
 ] 

Lars Hofhansl commented on HBASE-6122:
--------------------------------------

+1 patch looks good to me.
                
> Backup master does not become Active master after ZK exception
> --------------------------------------------------------------
>
>                 Key: HBASE-6122
>                 URL: https://issues.apache.org/jira/browse/HBASE-6122
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.0
>            Reporter: ramkrishna.s.vasudevan
>             Fix For: 0.92.2, 0.96.0, 0.94.1
>
>         Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch
>
>
> -> Active master gets ZK expiry exception.
> -> Backup master becomes active.
> -> The previous active master retries and becomes the back up master.
> Now when the new active master goes down and the current back up master comes 
> up, it goes down again with the zk expiry exception it got in the first step.
> {code}
> if (abortNow(msg, t)) {
>       if (t != null) LOG.fatal(msg, t);
>       else LOG.fatal(msg);
>       this.abort = true;
>       stop("Aborting");
>     }
> {code}
> In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
> back up master becomes active. 
> {code}
>     synchronized (this.clusterHasActiveMaster) {
>       while (this.clusterHasActiveMaster.get() && !this.master.isStopped()) {
>         try {
>           this.clusterHasActiveMaster.wait();
>         } catch (InterruptedException e) {
>           // We expect to be interrupted when a master dies, will fall out if 
> so
>           LOG.debug("Interrupted waiting for master to die", e);
>         }
>       }
>       if (!clusterStatusTracker.isClusterUp()) {
>         this.master.stop("Cluster went down before this master became 
> active");
>       }
>       if (this.master.isStopped()) {
>         return cleanSetOfActiveMaster;
>       }
>       // Try to become active master again now that there is no active master
>       blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
>     }
>     return cleanSetOfActiveMaster;
> {code}
> When the back up master (it is in back up mode as he got ZK exception), once 
> again tries to come to active we don't get the return value that comes out 
> from 
> {code}
> // Try to become active master again now that there is no active master
>       blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
> {code}
> We tend to return the 'cleanSetOfActiveMaster' which was previously false.
> Now because of this instead of again becoming active the back up master goes 
> down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
> issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to