Rajini Sivaram created ZOOKEEPER-3100:
-----------------------------------------
Summary: ZooKeeper client times out due to random choice of
resolved addresses
Key: ZOOKEEPER-3100
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3100
Project: ZooKeeper
Issue Type: Bug
Components: java client
Affects Versions: 3.4.13
Reporter: Rajini Sivaram
The changes to ZooKeeper clients to re-resolve hosts made under ZOOKEEPER-2184
results in delays when only a subset of the addresses that a host resolves to
are actually reachable. This can result in connection timeouts on the client.
For example, when running tests with a single ZooKeeper server accepting
connections on 127.0.0.1 on a host that has both IPv4 and IPv6, we have seen
connection timeouts in tests if client connects using `localhost` rather than
`127.0.0.1`. ZooKeeper client resolves `localhost` to both the IPv4 and IPv6
addresses and chooses a random one. If IPv6 was chosen, a fixed one second
backoff is applied before retry since there is only one hostname specified.
After backoff, 'localhost' is resolved again and a random address chosen, which
could also be the unconnectable IPv6 address.
For the list of host names specified for connection, the clients do round-robin
without backoffs until connections to all hostnames are attempted. Can we also
do the same for addresses that each of the hosts resolves to, so that backoffs
are only applied after connection to each address is attempted once and every
address is connected to once using round-robin rather than random selection?
This will avoid delays in cases where at least one address can be connected to.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)