[
https://issues.apache.org/jira/browse/ZOOKEEPER-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859994#comment-17859994
]
Swathi Mocharla commented on ZOOKEEPER-4842:
--------------------------------------------
A point to be noted is that, without the trailing dot in the cluster domain,
this issue is never reproducible.
> Zookeeper quorum is not formed intermittently with trailing dot in the
> cluster domain name
> ------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-4842
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4842
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.8.4
> Reporter: Swathi Mocharla
> Priority: Major
>
> On kubernetes, we've set up the cluster domain with a trailing dot. Doing so,
> we are seeing very often that the zookeeper quorum itself is not being
> established.
>
> {code:java}
> bash-4.4$ env -u KAFKA_OPTS zookeeper-shell localhost:2181 config
> Connecting to localhost:2181
> [2024-06-25 10:36:39,178] WARN Client session timed out, have not heard from
> server in 30031ms for session id 0x0 (org.apache.zookeeper.ClientCnxn)
> [2024-06-25 10:36:39,182] WARN Session 0x0 for server
> localhost/[0:0:0:0:0:0:0:1]:2181, Closing socket connection. Attempting
> reconnect except it is a SessionExpiredException.
> (org.apache.zookeeper.ClientCnxn)
> org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed
> out, have not heard from server in 30031ms for session id 0x0
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1257)
> KeeperErrorCode = ConnectionLoss for /zookeeper/config
>
> {code}
>
> In the zookeeper logs, we see a lot of IOExceptions, UnknownHost and
> Interrupted exceptions.
>
> {code:java}
> java.io.IOException: ZooKeeperServer not running
> at
> org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:565)
> at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:350)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
> at
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:153)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> {"type":"log", "host":"zk-swkf-2.default", "level":"WARN",
> "systemid":"zookeeper-2b13339237454984887b4908dc3a6df0",
> "system":"zookeeper", "time":"2024-06-25T10:23:16.325Z", "timezone":"UTC",
> "log":{"message":"NIOWorkerThread-1 -
> org.apache.zookeeper.server.NIOServerCnxn - Close of session 0x0"}}
>
> java.lang.InterruptedException
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
> Source)
> at
> org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1453)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:99)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1277)
> {code}
>
>
> this is the content of the /etc/resolve.conf
> {code:java}
> bash-4.4$ cat /etc/resolv.conf
> search default.svc.cluster.local svc.cluster.local cluster.local bcmt
> nameserver 10.254.0.10
> options ndots:5{code}
>
>
> {code:java}
> [root@vm-10-76-72-33 ckaf-kafka]# nslookup zk-swkf.default.svc.cluster.local.
> Server: 10.76.72.33
> Address: 10.76.72.33#53
> Name: zk-swkf.default.svc.cluster.local
> Address: 10.254.94.24
> [root@vm-10-76-72-33 ckaf-kafka]# nslookup zk-swkf.default.svc.cluster.local
> Server: 10.76.72.33
> Address: 10.76.72.33#53
> Name: zk-swkf.default.svc.cluster.local
> Address: 10.254.94.24
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)