[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286334#comment-13286334
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-6122 at 5/31/12 5:26 AM:
------------------------------------------------------------------------

I checked the test case.
Ideally the flow is making the master to become active but the problem as 
described in this JIRA still makes the master to go down.

I added a log in ActiveMasterManager.blockUntilBecomingActiveMaster
{code}
        LOG.info("Master is now available "+this.sn);
        this.clusterHasActiveMaster.set(true);
        LOG.info("Master=" + this.sn);
        return cleanSetOfActiveMaster;
{code}
See the below log in the logs.
{code}
2012-05-31 10:52:29,050 INFO  [pool-29-thread-1] 
master.ActiveMasterManager(149): Master is now available 
Htipl-01388.china.huawei.com,3569,1338441734226
2012-05-31 10:52:29,050 INFO  [pool-29-thread-1] 
master.ActiveMasterManager(151): 
Master=Htipl-01388.china.huawei.com,3569,1338441734226
{code}
This means ideally the master should come up if there is no problem in again 
becoming active.  Along with the patch this testcase should be modified to make 
the assertTrue to assertFalse.

Pls correct me if am wrong.  The fix still remains valid.
                
      was (Author: ram_krish):
    I checked the test case.
Ideally the flow is making the master to become active but the problem as 
described in this JIRA still makes the master to go down.

I added a log in ActiveMasterManager.blockUntilBecomingActiveMaster
{code}
{code}
{code}
2012-05-31 10:52:29,050 INFO  [pool-29-thread-1] 
master.ActiveMasterManager(149): Master is now available 
Htipl-01388.china.huawei.com,3569,1338441734226
2012-05-31 10:52:29,050 INFO  [pool-29-thread-1] 
master.ActiveMasterManager(151): 
Master=Htipl-01388.china.huawei.com,3569,1338441734226
{code}
                  
> Backup master does not become Active master after ZK exception
> --------------------------------------------------------------
>
>                 Key: HBASE-6122
>                 URL: https://issues.apache.org/jira/browse/HBASE-6122
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.92.2, 0.94.1
>
>         Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch
>
>
> -> Active master gets ZK expiry exception.
> -> Backup master becomes active.
> -> The previous active master retries and becomes the back up master.
> Now when the new active master goes down and the current back up master comes 
> up, it goes down again with the zk expiry exception it got in the first step.
> {code}
> if (abortNow(msg, t)) {
>       if (t != null) LOG.fatal(msg, t);
>       else LOG.fatal(msg);
>       this.abort = true;
>       stop("Aborting");
>     }
> {code}
> In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
> back up master becomes active. 
> {code}
>     synchronized (this.clusterHasActiveMaster) {
>       while (this.clusterHasActiveMaster.get() && !this.master.isStopped()) {
>         try {
>           this.clusterHasActiveMaster.wait();
>         } catch (InterruptedException e) {
>           // We expect to be interrupted when a master dies, will fall out if 
> so
>           LOG.debug("Interrupted waiting for master to die", e);
>         }
>       }
>       if (!clusterStatusTracker.isClusterUp()) {
>         this.master.stop("Cluster went down before this master became 
> active");
>       }
>       if (this.master.isStopped()) {
>         return cleanSetOfActiveMaster;
>       }
>       // Try to become active master again now that there is no active master
>       blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
>     }
>     return cleanSetOfActiveMaster;
> {code}
> When the back up master (it is in back up mode as he got ZK exception), once 
> again tries to come to active we don't get the return value that comes out 
> from 
> {code}
> // Try to become active master again now that there is no active master
>       blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
> {code}
> We tend to return the 'cleanSetOfActiveMaster' which was previously false.
> Now because of this instead of again becoming active the back up master goes 
> down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
> issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to