[ 
https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267743#comment-13267743
 ] 

stack commented on HBASE-5883:
------------------------------

Can't we at least check the message to ensure its what we expect?  (See the 
second catch below where we look for "connection reset").  Can we be sure what 
comes up here is the ConnectException we set down in HBaseRPC?

{code}
+      if (ioe instanceof ConnectException) {
+        // Catch. Connect refused.
{code}

This redoing of an exception seems problematic.  Its really necessary?

{code}
+        } else if (ioex.getMessage().toLowerCase()
+            .contains("connection refused")) {
+          ce = new ConnectException(ioex.getMessage());
+          ioe = ce;
{code}

I'd feel better about this fix if we could figure where the exception came from 
(Its not from the rpc stringifying of exceptions to pass them from server to 
client?
                
> Backup master is going down due to connection refused exception
> ---------------------------------------------------------------
>
>                 Key: HBASE-5883
>                 URL: https://issues.apache.org/jira/browse/HBASE-5883
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1, 0.94.0
>            Reporter: Gopinathan A
>            Assignee: Jieshan Bean
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, 
> HBASE-5883-94.patch, HBASE-5883-trunk.patch
>
>
> The active master node network was down for some time (This node contains 
> Master,DN,ZK,RS). Here backup node got 
> notification, and started to became active. Immedietly backup node got 
> aborted with the below exception.
> {noformat}
> 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
> finished splitting (more than or equal to) 861248320 bytes in 4 log files in 
> [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting]
>  in 26374ms
> 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master 
> server abort: loaded coprocessors are: []
> 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown.
> java.io.IOException: java.net.ConnectException: Connection refused
>       at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375)
>       at 
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045)
>       at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897)
>       at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>       at $Proxy13.getProtocolVersion(Unknown Source)
>       at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220)
>       at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569)
>       at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369)
>       at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353)
>       at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660)
>       at 
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616)
>       at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540)
>       at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
>       at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
>       at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328)
>       at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362)
>       ... 20 more
> 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: 
> Stopping service threads
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to