Pankaj Kumar created HBASE-21535:
------------------------------------

             Summary: Zombie Master detector is not working
                 Key: HBASE-21535
                 URL: https://issues.apache.org/jira/browse/HBASE-21535
             Project: HBase
          Issue Type: Bug
          Components: master
            Reporter: Pankaj Kumar
            Assignee: Pankaj Kumar


We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
based on _hbase.master.initializationmonitor.timeout _and halts if 
_hbase.master.initializationmonitor.haltontimeout_ set _true_.

After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
active after Initializing ZK system trackers as follows,
{noformat}
 status.setStatus("Initializing ZK system trackers");
 initializeZKBasedSystemTrackers();
 status.setStatus("Loading last flushed sequence id of regions");
 try {
 this.serverManager.loadLastFlushedSequenceIds();
 } catch (IOException e) {
 LOG.debug("Failed to load last flushed sequence id of regions"
 + " from file system", e);
 }
 // Set ourselves as active Master now our claim has succeeded up in zk.
 this.activeMaster = true;
{noformat}

But Zombie detector thread is started at the begining phase of 
finishActiveMasterInitialization(),
{noformat}
 private void finishActiveMasterInitialization(MonitoredTask status) throws 
IOException,
 InterruptedException, KeeperException, ReplicationException {
 Thread zombieDetector = new Thread(new InitializationMonitor(this),
 "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
 zombieDetector.setDaemon(true);
 zombieDetector.start();
{noformat}

During zombieDetector execution "master.isActiveMaster()" will be false, so it 
won't wait and cant detect zombie master.
{noformat}
 @Override
 public void run() {
 try {
 while (!master.isStopped() && master.isActiveMaster()) {
 Thread.sleep(timeout);
 if (master.isInitialized()) {
 LOG.debug("Initialization completed within allotted tolerance. Monitor 
exiting.");
 } else {
 LOG.error("Master failed to complete initialization after " + timeout + "ms. 
Please"
 + " consider submitting a bug report including a thread dump of this 
process.");
 if (haltOnTimeout) {
 LOG.error("Zombie Master exiting. Thread dump to stdout");
 Threads.printThreadInfo(System.out, "Zombie HMaster");
 System.exit(-1);
 }
 }
 }
 } catch (InterruptedException ie) {
 LOG.trace("InitMonitor thread interrupted. Existing.");
 }
 }
 }
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to