[
https://issues.apache.org/jira/browse/ZOOKEEPER-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke Chen updated ZOOKEEPER-4728:
---------------------------------
Description:
Note: This issue also happened in the latest `master` branch
When the leader tried to bind the host/IP to get connection from followers, if
the DNS is not ready at first, it'll always stay in {{<unresolved>}} state
forever. The error log is like this:
{code:java}
2023-07-26 00:25:25,251 ERROR Couldn't bind to localhost1/<unresolved>:2888
(org.apache.zookeeper.server.quorum.Leader)
[QuorumPeer[myid=1]]java.net.SocketException: Unresolved address at
java.base/java.net.ServerSocket.bind(ServerSocket.java:380) at
java.base/java.net.ServerSocket.bind(ServerSocket.java:342) at
org.apache.zookeeper.server.quorum.Leader.createServerSocket(Leader.java:315)
at org.apache.zookeeper.server.quorum.Leader.lambda$new$0(Leader.java:294)
at
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at
java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3573)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:297) at
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1272)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1479)2023-07-26
00:25:25,252 WARN Unexpected exception
(org.apache.zookeeper.server.quorum.QuorumPeer)
[QuorumPeer[myid=1]]java.io.IOException: Leader failed to initialize any of the
following sockets: [localhost1/<unresolved>:2888] at
org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:300) at
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1272)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1479)
{code}
This repeatedly appear and never successfully bind to the address, so the
quorum never formed.
Reproduce steps:
1. setup 1 zookeeper node, and set the zookeeper connection config as:
{code:java}
server.1=localhost1:2888:3888{code}
Note, it's "localhost1"
2. startup the zookeeper node, it'll show the `Exception while listening` error
, as well as the `Couldn't bind to localhost1/<unresolved>:2888 ` error like
above. This is to simulate the DNS is not ready when zookeeper startup. It's
quite common in k8s environment.
3. edit /etc/hosts, map `localhost1` into `127.0.0.1`
4. You can see the log, the `Exception while listening` error is gone, but
`Couldn't bind to localhost1/<unresolved>:2888 ` still keeps appearing, and the
quorum never formed.
Note: The `Exception while listening` can be self-healing is because it
re-resolve the hostname each time it tried to bind the hostname. So we should
apply the same solution to the leader binding. (i.e. ZOOKEEPER-3991)
was:
Note: This issue also happened in the latest `master` branch
When the leader tried to bind the host/IP to get connection from followers, if
the DNS is not ready at first, it'll always stay in {{<unresolved>}} state
forever. The error log is like this:
{code:java}
2023-07-26 00:25:25,251 ERROR Couldn't bind to localhost1/<unresolved>:2888
(org.apache.zookeeper.server.quorum.Leader)
[QuorumPeer[myid=1]]java.net.SocketException: Unresolved address at
java.base/java.net.ServerSocket.bind(ServerSocket.java:380) at
java.base/java.net.ServerSocket.bind(ServerSocket.java:342) at
org.apache.zookeeper.server.quorum.Leader.createServerSocket(Leader.java:315)
at org.apache.zookeeper.server.quorum.Leader.lambda$new$0(Leader.java:294)
at
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at
java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3573)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:297) at
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1272)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1479)2023-07-26
00:25:25,252 WARN Unexpected exception
(org.apache.zookeeper.server.quorum.QuorumPeer)
[QuorumPeer[myid=1]]java.io.IOException: Leader failed to initialize any of the
following sockets: [localhost1/<unresolved>:2888] at
org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:300) at
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1272)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1479)
{code}
This repeatedly appear and never successfully bind to the address, so the
quorum never formed.
Reproduce steps:
1. setup 1 zookeeper node, and set the zookeeper connection config as:
{code:java}
server.1=localhost1:2888:3888{code}
Note, it's "localhost1"
2. startup the zookeeper node, it'll show the `Exception while listening` error
, as well as the `Couldn't bind to localhost1/<unresolved>:2888 ` error like
above. This is to simulate the DNS is not ready when zookeeper startup. It's
quite common in k8s environment.
3. edit /etc/hosts, map `localhost1` into `127.0.0.1`
4. You can see the log, the `Exception while listening` error is gone, but
`Couldn't bind to localhost1/<unresolved>:2888 ` still keeps appearing, and the
quorum never formed.
Note: The `Exception while listening` can be self-healing is because it
re-resolve the hostname each time it tried to bind the hostname. So we should
apply the same solution to the leader binding.
> Zookeepr cannot bind to itself forever if DNS is not ready when startup
> -----------------------------------------------------------------------
>
> Key: ZOOKEEPER-4728
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4728
> Project: ZooKeeper
> Issue Type: Sub-task
> Affects Versions: 3.6.4
> Reporter: Luke Chen
> Priority: Major
>
> Note: This issue also happened in the latest `master` branch
>
> When the leader tried to bind the host/IP to get connection from followers,
> if the DNS is not ready at first, it'll always stay in {{<unresolved>}} state
> forever. The error log is like this:
>
> {code:java}
> 2023-07-26 00:25:25,251 ERROR Couldn't bind to localhost1/<unresolved>:2888
> (org.apache.zookeeper.server.quorum.Leader)
> [QuorumPeer[myid=1]]java.net.SocketException: Unresolved address at
> java.base/java.net.ServerSocket.bind(ServerSocket.java:380) at
> java.base/java.net.ServerSocket.bind(ServerSocket.java:342) at
> org.apache.zookeeper.server.quorum.Leader.createServerSocket(Leader.java:315)
> at org.apache.zookeeper.server.quorum.Leader.lambda$new$0(Leader.java:294)
> at
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
> at
> java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3573)
> at
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
> at
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
> at
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> at
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> at
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
> at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:297)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1272)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1479)2023-07-26
> 00:25:25,252 WARN Unexpected exception
> (org.apache.zookeeper.server.quorum.QuorumPeer)
> [QuorumPeer[myid=1]]java.io.IOException: Leader failed to initialize any of
> the following sockets: [localhost1/<unresolved>:2888] at
> org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:300) at
> org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1272)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1479) {code}
>
>
> This repeatedly appear and never successfully bind to the address, so the
> quorum never formed.
>
> Reproduce steps:
> 1. setup 1 zookeeper node, and set the zookeeper connection config as:
> {code:java}
> server.1=localhost1:2888:3888{code}
> Note, it's "localhost1"
> 2. startup the zookeeper node, it'll show the `Exception while listening`
> error , as well as the `Couldn't bind to localhost1/<unresolved>:2888 ` error
> like above. This is to simulate the DNS is not ready when zookeeper startup.
> It's quite common in k8s environment.
> 3. edit /etc/hosts, map `localhost1` into `127.0.0.1`
> 4. You can see the log, the `Exception while listening` error is gone, but
> `Couldn't bind to localhost1/<unresolved>:2888 ` still keeps appearing, and
> the quorum never formed.
>
> Note: The `Exception while listening` can be self-healing is because it
> re-resolve the hostname each time it tried to bind the hostname. So we should
> apply the same solution to the leader binding. (i.e. ZOOKEEPER-3991)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)