[jira] [Commented] (ZOOKEEPER-3100) ZooKeeper client times out due to random choice of resolved addresses

2018-07-27 Thread Rajini Sivaram (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559837#comment-16559837
 ] 

Rajini Sivaram commented on ZOOKEEPER-3100:
---

[~andorm] In the failing Kafka test, ZooKeeper was not listening on the wilcard 
address, it was listening specifically on 127.0.0.1 
(([https://github.com/apache/kafka/blob/trunk/core/src/test/scala/unit/kafka/zk/EmbeddedZookeeper.scala)|https://github.com/apache/kafka/blob/trunk/core/src/test/scala/unit/kafka/zk/EmbeddedZookeeper.scala).].
 Hence the connection to the IPv6 address was failing. I think in the example 
above, you were running ZooKeeper on the wildcard address and hence it worked 
for both IPv4 and IPv6.

We have fixed the Kafka tests by changing clients to connect to `127.0.0.1` 
instead of `localhost` and that is a reasonable workaround since the server is 
bound explicitly to that address. But since this used to work before, perhaps 
it would be better to guarantee that all the possible addresses are attempted 
before applying backoff as large as a second?

> ZooKeeper client times out due to random choice of resolved addresses
> -
>
> Key: ZOOKEEPER-3100
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3100
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.13
>Reporter: Rajini Sivaram
>Assignee: Andor Molnar
>Priority: Major
>
> The changes to ZooKeeper clients to re-resolve hosts made under 
> ZOOKEEPER-2184 results in delays when only a subset of the addresses that a 
> host resolves to are actually reachable. This can result in connection 
> timeouts on the client.
> For example, when running tests with a single ZooKeeper server accepting 
> connections on 127.0.0.1 on a host that has both IPv4 and IPv6, we have seen 
> connection timeouts in tests if client connects using `localhost` rather than 
> `127.0.0.1`. ZooKeeper client resolves `localhost` to both the IPv4 and IPv6 
> addresses and chooses a random one. If IPv6 was chosen, a fixed one second 
> backoff is applied before retry since there is only one hostname specified. 
> After backoff, 'localhost' is resolved again and a random address chosen, 
> which could also be the unconnectable IPv6 address.
> For the list of host names specified for connection, the clients do 
> round-robin without backoffs until connections to all hostnames are 
> attempted. Can we also do the same for addresses that each of the hosts 
> resolves to, so that backoffs are only applied after connection to each 
> address is attempted once and every address is connected to once using 
> round-robin rather than random selection? This will avoid delays in cases 
> where at least one address can be connected to.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3100) ZooKeeper client times out due to random choice of resolved addresses

2018-07-23 Thread Rajini Sivaram (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553039#comment-16553039
 ] 

Rajini Sivaram commented on ZOOKEEPER-3100:
---

[~andorm] In the failing tests, ZooKeeper was being started on `127.0.0.1` 
explicitly 
([https://github.com/apache/kafka/blob/trunk/core/src/test/scala/unit/kafka/zk/EmbeddedZookeeper.scala).]
 Clients were connecting using `localhost`. We are fixing this for now in the 
tests by changing the clients to use `127.0.0.1`, but it used to work fine 
before upgrading ZK clients.

> ZooKeeper client times out due to random choice of resolved addresses
> -
>
> Key: ZOOKEEPER-3100
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3100
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.13
>Reporter: Rajini Sivaram
>Priority: Major
>
> The changes to ZooKeeper clients to re-resolve hosts made under 
> ZOOKEEPER-2184 results in delays when only a subset of the addresses that a 
> host resolves to are actually reachable. This can result in connection 
> timeouts on the client.
> For example, when running tests with a single ZooKeeper server accepting 
> connections on 127.0.0.1 on a host that has both IPv4 and IPv6, we have seen 
> connection timeouts in tests if client connects using `localhost` rather than 
> `127.0.0.1`. ZooKeeper client resolves `localhost` to both the IPv4 and IPv6 
> addresses and chooses a random one. If IPv6 was chosen, a fixed one second 
> backoff is applied before retry since there is only one hostname specified. 
> After backoff, 'localhost' is resolved again and a random address chosen, 
> which could also be the unconnectable IPv6 address.
> For the list of host names specified for connection, the clients do 
> round-robin without backoffs until connections to all hostnames are 
> attempted. Can we also do the same for addresses that each of the hosts 
> resolves to, so that backoffs are only applied after connection to each 
> address is attempted once and every address is connected to once using 
> round-robin rather than random selection? This will avoid delays in cases 
> where at least one address can be connected to.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3100) ZooKeeper client times out due to random choice of resolved addresses

2018-07-23 Thread Rajini Sivaram (JIRA)
Rajini Sivaram created ZOOKEEPER-3100:
-

 Summary: ZooKeeper client times out due to random choice of 
resolved addresses
 Key: ZOOKEEPER-3100
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3100
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.13
Reporter: Rajini Sivaram


The changes to ZooKeeper clients to re-resolve hosts made under ZOOKEEPER-2184 
results in delays when only a subset of the addresses that a host resolves to 
are actually reachable. This can result in connection timeouts on the client.

For example, when running tests with a single ZooKeeper server accepting 
connections on 127.0.0.1 on a host that has both IPv4 and IPv6, we have seen 
connection timeouts in tests if client connects using `localhost` rather than 
`127.0.0.1`. ZooKeeper client resolves `localhost` to both the IPv4 and IPv6 
addresses and chooses a random one. If IPv6 was chosen, a fixed one second 
backoff is applied before retry since there is only one hostname specified. 
After backoff, 'localhost' is resolved again and a random address chosen, which 
could also be the unconnectable IPv6 address.

For the list of host names specified for connection, the clients do round-robin 
without backoffs until connections to all hostnames are attempted. Can we also 
do the same for addresses that each of the hosts resolves to, so that backoffs 
are only applied after connection to each address is attempted once and every 
address is connected to once using round-robin rather than random selection? 
This will avoid delays in cases where at least one address can be connected to.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)