[ https://issues.apache.org/jira/browse/ZOOKEEPER-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872521#comment-15872521 ]
ASF GitHub Bot commented on ZOOKEEPER-2691: ------------------------------------------- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/173#discussion_r101845264 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java --- @@ -181,6 +197,33 @@ public void recreateSocketAddresses() { } } + /** + * Resolve the hostname to IP addresses, and find one reachable address. + * + * @param hostname the name of the host + * @param timeout the time, in millseconds, before {@link InetAddress#isReachable} + * aborts + * @return a reachable IP address. If no such IP address can be found, + * just return the first IP address of the hostname. + * + * @exception UnknownHostException + */ + public InetAddress getReachableAddress(String hostname, int timeout) + throws UnknownHostException { + InetAddress[] addresses = InetAddress.getAllByName(hostname); + for (InetAddress a : addresses) { + try { + if (a.isReachable(timeout)) { --- End diff -- My main problem with this PR is that call to `isReachable(timeout)` for two reasons: 1) the most important one: `isReachable(timeout)` seems unreliable so there are plenty cases where it returns false even tough the node is reachable or vice-versa! https://bugs.openjdk.java.net/browse/JDK-8159410 (google "InetAddress.isReachable not working" or "InetAddress.isReachable unreliable" to see further cases). 2) This timeout can add an arbitrary delay until a reachable node can be tested. IDK what a good compromise would be for both points above (leaving as it is today could work, so no problem, even tough I am a bit concerned), but maybe we could use a solution similar to ZOOKEEPER-2184 and return the next address in the array (using `next = ++next % addresses.length` to prevent out of bound exceptions). > recreateSocketAddresses may recreate the unreachable IP address > --------------------------------------------------------------- > > Key: ZOOKEEPER-2691 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2691 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.4.8 > Environment: Centos6.5 > Java8 > ZooKeeper3.4.8 > Reporter: JiangJiafu > Priority: Minor > > The QuorumPeer$QuorumServer.recreateSocketAddress() is used to resolved the > hostname to a new IP address(InetAddress) when any exception happens to the > socket. It will be very useful when a hostname can be resolved to more than > one IP address. > But the problem is Java API InetAddress.getByName(String hostname) will > always return the first IP address when the hostname can be resolved to more > than one IP address, and the first IP address may be unreachable forever. For > example, if a machine has two network interfaces: eth0, eth1, say eth0 has > ip1, eth1 has ip2, the relationship between hostname and the IP addresses is > set in /etc/hosts. When I "close" the eth0 by command "ifdown eth0", the > InetAddress.getByName(String hostname) will still return ip1, which is > unreachable forever. > So I think it will be better to check the IP address by > InetAddress.isReachable(long) and choose the reachable IP address. > I have modified the ZooKeeper source code, and test the new code in my own > environment, and it can work very well when I turn down some network > interfaces using "ifdown" command. > The original code is: > {code:title=QuorumPeer.java|borderStyle=solid} > public void recreateSocketAddresses() { > InetAddress address = null; > try { > address = InetAddress.getByName(this.hostname); > LOG.info("Resolved hostname: {} to address: {}", > this.hostname, address); > this.addr = new InetSocketAddress(address, this.port); > if (this.electionPort > 0){ > this.electionAddr = new InetSocketAddress(address, > this.electionPort); > } > } catch (UnknownHostException ex) { > LOG.warn("Failed to resolve address: {}", this.hostname, ex); > // Have we succeeded in the past? > if (this.addr != null) { > // Yes, previously the lookup succeeded. Leave things as > they are > return; > } > // The hostname has never resolved. Create our > InetSocketAddress(es) as unresolved > this.addr = InetSocketAddress.createUnresolved(this.hostname, > this.port); > if (this.electionPort > 0){ > this.electionAddr = > InetSocketAddress.createUnresolved(this.hostname, > > this.electionPort); > } > } > } > {code} > After my modification: > {code:title=QuorumPeer.java|borderStyle=solid} > public void recreateSocketAddresses() { > InetAddress address = null; > try { > address = getReachableAddress(this.hostname); > LOG.info("Resolved hostname: {} to address: {}", > this.hostname, address); > this.addr = new InetSocketAddress(address, this.port); > if (this.electionPort > 0){ > this.electionAddr = new InetSocketAddress(address, > this.electionPort); > } > } catch (UnknownHostException ex) { > LOG.warn("Failed to resolve address: {}", this.hostname, ex); > // Have we succeeded in the past? > if (this.addr != null) { > // Yes, previously the lookup succeeded. Leave things as > they are > return; > } > // The hostname has never resolved. Create our > InetSocketAddress(es) as unresolved > this.addr = InetSocketAddress.createUnresolved(this.hostname, > this.port); > if (this.electionPort > 0){ > this.electionAddr = > InetSocketAddress.createUnresolved(this.hostname, > > this.electionPort); > } > } > } > public InetAddress getReachableAddress(String hostname) throws > UnknownHostException { > InetAddress[] addresses = InetAddress.getAllByName(hostname); > for (InetAddress a : addresses) { > try { > if (a.isReachable(5000)) { > return a; > } > } catch (IOException e) { > LOG.warn("IP address {} is unreachable", a); > } > } > // All the IP address is unreachable, just return the first one. > return addresses[0]; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)