[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988088#action_12988088
 ] 

Hugh Warrington commented on ZOOKEEPER-979:
-------------------------------------------

Ok, I think I've got to the bottom of this. We were programmatically building a 
java.util.Properties object, using

for (InetAddress host : hosts) {
    properties.put(String.format("server.%d", i++), 
String.format("%s:2888:3888", host.toString()));
}

This was building properties of the form

/10.0.0.1:2888:3888

Notice the leading slash. We then passed the Properties object into 
QuorumPeerConfig.parseProperties(), which duly constructs an InetSocketAddress 
with hostname '/10.0.0.1' and port 3888. Note that since the hostname contains 
the bogus character at the start, the resulting electionAddr.isUnresolved() 
will be true, since the attempt to resolve the hostname will have failed.

Everything then continues until the first attempt is made to do 
Socket.connect() with that InetSocketAddress. At this point, some undocumented 
behaviour in the Socket class comes into play. In 
sun.nio.ch.SocketAdaptor.connect() (line 140 in openjdk 1.6.0_17 that I'm 
using) it calls Net.translateException(), which takes the 
UnresolvedAddressException and instead throws an UnknownHostException. The 
rationale behind this seems to be that UnresolvedHostException is an unchecked 
exception, and they want to throw an IOException ("Throw UnknownHostException 
from here since it cannot be thrown as a SocketException"). So instead they 
just obscure the true source of the problem, and the developer is none the 
wiser. It doesn't seem to be stated anywhere, but apparently you may only call 
Socket.connect() with a resolved InetSocketAddress.

Anyway, it seems to me the thing to do here would be to try to resolve the 
provided server addresses much earlier. Perhaps even in QuorumPeerConfig, via 
InetAddress.getByName().

> UnknownHostException in QuorumCnxManager
> ----------------------------------------
>
>                 Key: ZOOKEEPER-979
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-979
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2
>            Reporter: Hugh Warrington
>            Priority: Minor
>
> I'm using zk 3.3.2 and I'm seeing this in my logs around startup:
> 2011-01-27 10:16:21,513 [WorkerSender Thread] WARN  
> org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 
> 0 at election address xxx.yyy.com/10.2.131.19:3888
> java.net.UnknownHostException
>       at sun.nio.ch.Net.translateException(Net.java:100)
>       at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:140)
>       at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:366)
>       at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:335)
>       at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
>       at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
>       at java.lang.Thread.run(Thread.java:636)
> And all subsequent zk ops give {{ConnectionLossException}}.
> I've just explained this to breed_zk on IRC, and he asked me to file a 
> ticket, mentioning that UnknownHostException may sometimes be thrown for 
> reasons other than host resolution. While I'm reasonably certain that the 
> hostname is correct and should be contactable, I need to put some more time 
> into checking our network setup to be absolutely sure. However, two 
> observations arose while looking into this:
> * At the top of QuorumCnxManager.connectOne(), we set electionAddr (or fail 
> and return). But then a few lines later we don't actually use this local 
> variable in the call to connect(). This seems like a minor programming 
> mistake (although AFAICT it doesn't change the behaviour).
> * In the subsequent catch block, the UnknownHostException that's thrown 
> doesn't contain the address that we were trying to connect to (though if you 
> capture WARN log messages, you can see what it was).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to