Pankaj Kumar created HBASE-21535: ------------------------------------ Summary: Zombie Master detector is not working Key: HBASE-21535 URL: https://issues.apache.org/jira/browse/HBASE-21535 Project: HBase Issue Type: Bug Components: master Reporter: Pankaj Kumar Assignee: Pankaj Kumar
We have InitializationMonitor thread in HMaster which detects Zombie Hmaster based on _hbase.master.initializationmonitor.timeout _and halts if _hbase.master.initializationmonitor.haltontimeout_ set _true_. After HBASE-19694, HMaster initialization order was correted. Hmaster is set active after Initializing ZK system trackers as follows, {noformat} status.setStatus("Initializing ZK system trackers"); initializeZKBasedSystemTrackers(); status.setStatus("Loading last flushed sequence id of regions"); try { this.serverManager.loadLastFlushedSequenceIds(); } catch (IOException e) { LOG.debug("Failed to load last flushed sequence id of regions" + " from file system", e); } // Set ourselves as active Master now our claim has succeeded up in zk. this.activeMaster = true; {noformat} But Zombie detector thread is started at the begining phase of finishActiveMasterInitialization(), {noformat} private void finishActiveMasterInitialization(MonitoredTask status) throws IOException, InterruptedException, KeeperException, ReplicationException { Thread zombieDetector = new Thread(new InitializationMonitor(this), "ActiveMasterInitializationMonitor-" + System.currentTimeMillis()); zombieDetector.setDaemon(true); zombieDetector.start(); {noformat} During zombieDetector execution "master.isActiveMaster()" will be false, so it won't wait and cant detect zombie master. {noformat} @Override public void run() { try { while (!master.isStopped() && master.isActiveMaster()) { Thread.sleep(timeout); if (master.isInitialized()) { LOG.debug("Initialization completed within allotted tolerance. Monitor exiting."); } else { LOG.error("Master failed to complete initialization after " + timeout + "ms. Please" + " consider submitting a bug report including a thread dump of this process."); if (haltOnTimeout) { LOG.error("Zombie Master exiting. Thread dump to stdout"); Threads.printThreadInfo(System.out, "Zombie HMaster"); System.exit(-1); } } } } catch (InterruptedException ie) { LOG.trace("InitMonitor thread interrupted. Existing."); } } } {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)