[ https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhihong Yu updated HBASE-5572: ------------------------------ Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Resolved as part of HBASE-5549 > KeeperException.SessionExpiredException management could be improved in Master > ------------------------------------------------------------------------------ > > Key: HBASE-5572 > URL: https://issues.apache.org/jira/browse/HBASE-5572 > Project: HBase > Issue Type: Improvement > Components: master > Affects Versions: 0.96.0 > Reporter: nkeywal > Assignee: nkeywal > Priority: Minor > Fix For: 0.96.0 > > Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch, > 5572.v2.patch > > > Synthesis: > 1) TestMasterZKSessionRecovery distinguish two cases on > SessionExpiredException. One is explicitly not managed. However, is seems > that there is no reason for this. > 2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a > quite complex function, with a useless recursive call. > 3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is > equivalent to TestZooKeeper#testMasterSessionExpired > 4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be > removed if we merge the two cases mentioned above. > Changes are: > 2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a > single case and remove recursion > 1) Removing TestMasterZKSessionRecovery > Detailed justification: > testMasterZKSessionRecoveryFailure says: > {noformat} > /** > * Negative test of master recovery from zk session expiry. > * > * Starts with one master. Fakes the master zk session expired. > * Ensures the master cannot recover the expired zk session since > * the master zk node is still there. > */ > public void testMasterZKSessionRecoveryFailure() throws Exception { > MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster(); > HMaster m = cluster.getMaster(); > m.abort("Test recovery from zk session expired", > new KeeperException.SessionExpiredException()); > assertTrue(m.isStopped()); > } > {noformat} > This tests works, i.e. the assertion is always verified. > But do we really want this behavior? > When looking at the code, we see that this what's happening is strange: > - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false > HMaster#abort stops the master. > - HMaster#abortNow checks the exception type. As it's a > SessionExpiredException it will try to recover, calling > HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false > (and that will make HMaster#abort stopping the master) > - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and > then try to become the active master. If it cannot, it will return false (and > that will make HMaster#abort stopping the master). > - HMaster#becomeActiveMaster returns the result of > ActiveMasterManager#blockUntilBecomingActiveMaster. > blockUntilBecomingActiveMaster says it will return false if there is any > error preventing it to become the active master. > - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master > address. If it's the same port & host, it deletes the nodes, that will start > a recursive call to blockUntilBecomingActiveMaster. This second call succeeds > (we became the active master) and return true. This result is ignored by the > first blockUntilBecomingActiveMaster: it return false (even if we actually > became the active master), hence the whole suite call returns false and > HMaster#abort stops the master. > In other words, the comment says "Ensures the master cannot recover the > expired zk session since the master zk node is still there." but we're > actually doing a check just for this and deleting the node. If we were not > ignoring the result, we would return "true", so we would not stop the master, > so the test would fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira