ramkrishna.s.vasudevan created HBASE-6122:
---------------------------------------------

             Summary: Backup master does not become Active master after ZK 
exception
                 Key: HBASE-6122
                 URL: https://issues.apache.org/jira/browse/HBASE-6122
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.94.0
            Reporter: ramkrishna.s.vasudevan
             Fix For: 0.96.0, 0.94.1


-> Active master gets ZK expiry exception.
-> Backup master becomes active.
-> The previous active master retries and becomes the back up master.
Now when the new active master goes down and the current back up master comes 
up, it goes down again with the zk expiry exception it got in the first step.

{code}
if (abortNow(msg, t)) {
      if (t != null) LOG.fatal(msg, t);
      else LOG.fatal(msg);
      this.abort = true;
      stop("Aborting");
    }
{code}
In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
back up master becomes active. 
{code}
    synchronized (this.clusterHasActiveMaster) {
      while (this.clusterHasActiveMaster.get() && !this.master.isStopped()) {
        try {
          this.clusterHasActiveMaster.wait();
        } catch (InterruptedException e) {
          // We expect to be interrupted when a master dies, will fall out if so
          LOG.debug("Interrupted waiting for master to die", e);
        }
      }
      if (!clusterStatusTracker.isClusterUp()) {
        this.master.stop("Cluster went down before this master became active");
      }
      if (this.master.isStopped()) {
        return cleanSetOfActiveMaster;
      }
      // Try to become active master again now that there is no active master
      blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
    }
    return cleanSetOfActiveMaster;
{code}
When the back up master (it is in back up mode as he got ZK exception), once 
again tries to come to active we don't get the return value that comes out from 
{code}
// Try to become active master again now that there is no active master
      blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
{code}
We tend to return the 'cleanSetOfActiveMaster' which was previously false.
Now because of this instead of again becoming active the back up master goes 
down in the abort() code.  Thanks to Gopi,my colleague for reporting this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to