[ https://issues.apache.org/jira/browse/HBASE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763306#comment-13763306 ]
Hudson commented on HBASE-9451: ------------------------------- SUCCESS: Integrated in hbase-0.96 #29 (See [https://builds.apache.org/job/hbase-0.96/29/]) HBASE-9451 Meta remains unassigned when the meta server crashes with the ClusterStatusListener set (nkeywal: rev 1521526) * /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java > Meta remains unassigned when the meta server crashes with the > ClusterStatusListener set > --------------------------------------------------------------------------------------- > > Key: HBASE-9451 > URL: https://issues.apache.org/jira/browse/HBASE-9451 > Project: HBase > Issue Type: Bug > Reporter: Devaraj Das > Assignee: Nicolas Liochon > Fix For: 0.98.0, 0.96.0 > > Attachments: 9451.v1.patch > > > While running tests described in HBASE-9338, ran into this problem. The > hbase.status.listener.class was set to > org.apache.hadoop.hbase.client.ClusterStatusListener$MultiCastListener. > 1. I had the meta server coming down > 2. The metaSSH got triggered. The call chain: > 2.1 verifyAndAssignMetaWithRetries > 2.2 verifyMetaRegionLocation > 2.3 waitForMetaServerConnection > 2.4 getMetaServerConnection > 2.5 getCachedConnection > 2.6 HConnectionManager.getAdmin(serverName, false) > 2.7 isDeadServer(serverName) -> This is hardcoded to return 'false' when > the clusterStatusListener field is null. If clusterStatusListener is not null > (in my test), then it could return true in certain cases (and in this case, > indeed it should return true since the server is down). I am trying to > understand why it's hardcoded to 'false' for former case. > 3. When isDeadServer returns true, the method > HConnectionManager.getAdmin(ServerName, boolean) throws > RegionServerStoppedException. > 4. Finally, after the retries are over verifyAndAssignMetaWithRetries gives > up and the master aborts. > The methods in the above call chain don't handle > RegionServerStoppedException. Maybe something to look at... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira