Hi All,

in a recent PR I try to fix an issue we found with Suhas Dantkale in
ZOOKEEPER-2146 (see https://github.com/apache/zookeeper/pull/1254). The
problem is that in ZooKeeper 3.5+ some quorum members can not rejoin to the
quorum after a restart if the server configs are set like this:

zoo.cfg in server 1:
server.1=0.0.0.0:2888:3888
server.2=some.fqdn-2.com:2888:3888
server.3=some.fqdn-3.com:2888:3888

zoo.cfg in server 2:
server.1=some.fqdn-1.com:2888:3888
server.2=0.0.0.0:2888:3888
server.3=some.fqdn-3.com:2888:3888

I am not exactly sure about the use case behind this config, but people
claim they need it for specific dockerized environments (see the comments
in the jira ticket). Is anyone familiar with such use cases? We never used
such configs in production as far as I can tell.

The above config worked without a problem in ZooKeeper 3.4.x, but not
perfectly for 3.5.x. It would be logical to keep supporting it. Still, I
think after the introduction of the dynamic reconfig, we kind of assume
that all the servers have the same server address configurations. So maybe
the config is not even valid anymore?

Using the 'quorumListenOnAllIPs' config property instead the 0.0.0.0 in the
configs might solve the issue. But if it is the case, then we definitely
should highlight this in the wiki / documentation. Maybe even printing out
a warning during ZooKeeper startup.

What do you think?

Kind regards,
Mate

Reply via email to