[
https://issues.apache.org/jira/browse/CURATOR-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184838#comment-17184838
]
J Robert Ray commented on CURATOR-578:
--------------------------------------
I am experiencing a combination of this problem and Curator eventually
attempting to attempt to connect to 0.0.0.0 as in CURATOR-392; the fix for that
does not handle the configuration scenario suggested by the [official Zookeeper
Docker image|https://hub.docker.com/_/zookeeper] for deploying to Docker Swarm,
specifically, using "0.0.0.0" as the bind address.
I have a deployment of three Zookeeper nodes in Docker Swarm, and have
attempted to give them stable IPs by pinning each container to a dedicated node
and using the node hostname when advertising the service:
{{ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181
server.2=host2:2888:3888;host2:2181 server.3=host3:2888:3888;host3:2181}}
{{ ZOO_SERVERS: server.1=host1:2888:3888;host1:2181
server.2=0.0.0.0:2888:3888;2181 server.3=host3:2888:3888;host3:2181}}
{{ ZOO_SERVERS: server.1=host1:2888:3888;host1:2181
server.2=host2:2888:3888;host2:2181 server.3=0.0.0.0:2888:3888;2181}}
Curator is configured initially with the connection string:
{{host1:2181,host2:2181,host3:2183}}. Things are fine until a Zookeeper node is
restarted for some reason.
This log is from the application using Curator, at the moment Zookeeper is
killed on host3. The addresses 10.5.x.x are the valid IP addresses for the
Docker hosts. The addresses 10.0.x.x are the Docker Swarm node internal
addresses, which change upon Zookeeper restarting.
[^curator.log]
The client ends up in a loop trying to connect to 10.0.x.x addresses, which may
no longer be valid, and 0.0.0.0.
Apart from this, my Zookeeper cluster does not reliably recover from a node
restart without manually stopping all but one node (for example, the restarted
node rejecting new connections because client has a higher zxid), which is
making me reconsider attempting to run the cluster with Swarm/k8s.
Curator 5.1.0, Zookeeper 3.6.1.
> EnsembleTracker replace hostname connectString with wrong ip from zk config
> ---------------------------------------------------------------------------
>
> Key: CURATOR-578
> URL: https://issues.apache.org/jira/browse/CURATOR-578
> Project: Apache Curator
> Issue Type: Bug
> Components: Client
> Affects Versions: 4.0.1
> Reporter: ying.li
> Priority: Major
> Attachments: curator.log
>
>
> I have a zookeeper cluster which run on a k8s cluster. and I use host name
> to connect the zookeeper(like :
> zookeeper-0.zookeeper-headless.default.svc.cluster.local:2181,zookeeper-1.zookeeper-headless.default.svc.cluster.local:2181,zookeeper-2.zookeeper-headless.default.svc.cluster.local:2181).
>
> When the zookeeper restart. the zk pod's ip will change. then I find my
> client will use the IP to recreate a client without using the hostname . but
> the IP is not the latest IP from hostname.so, it will make client never
> connect to zk , unless restart the client
>
> After some debug ,I find the EnsembleTracker will change the connectString
> from hostname to ip when receive the congfig change event. But in many case,
> the IP get from hostname will not change after zk restart in k8s. so, it
> will make client never connect to zk , unless restart the client
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)