[ https://issues.apache.org/jira/browse/ZOOKEEPER-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487059#comment-17487059 ]
Anoop Negi commented on ZOOKEEPER-4440: --------------------------------------- We are stuck on this, any update on this please > Zookeeper Upgrade failed when disabling Plain-text communication and ensemble > failed to form > -------------------------------------------------------------------------------------------- > > Key: ZOOKEEPER-4440 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4440 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.7.0 > Environment: Kubernetes 1.21.1 > Reporter: Anoop Negi > Priority: Critical > > We have three(3) node zookeeper cluster running as a pod on Kubernetes > cluster, > Zookeeper version is 3.7.0, While upgrading zookeeper from Plain-text+Secure > mode to only secure mode we are facing issue( i.e. disabling Plain-Text > channel) > 1. To disable plain-text we are removing <clientport> from the dynamic > configuration file to enable only secure communication but after upgrade > zookeeper ensemble failed to form. leader election continuous failing and > getting notification timeout > {code:java} > #server configuration > server.1=server1zookeeper.svc.cluster.local:2888:3888:participant > server.2=server2zookeeper.svc.cluster.local:2888:3888:participant > server.3=server3zookeeper.svc.cluster.local:2888:3888:participant > #secure port enabled > secureClientPort=2281 > {code} > {code:java} > > 2021-05-19T08:00:06.900+0000 [myid:] - WARN > [QuorumConnectionThread-[myid=3]-3:QuorumCnxManager@400] - Cannot open > channel to 1 at election address server1zookeeper/192.168.57.156:3888 > java.net.SocketTimeoutException: connect timed out at > java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?] at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) > ~[?:?] at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) > ~[?:?] at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) > ~[?:?] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?] > at java.net.Socket.connect(Socket.java:609) ~[?:?] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:383) > [zookeeper-3.7.0.jar:3.7.0] at > org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457) > [zookeeper-3.7.0.jar:3.7.0] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] > {code} > {code:java} > 2021-05-19T07:47:56.894+0000 [myid:] - INFO > [QuorumPeer[myid=1](plain=disabled)(secure=0.0.0.0:2281):FastLeaderElection@979] > - Notification time out: 60000 > {code} > 2. We also tried to perform reconfiguration from CLI using zkCli.sh but this > also not working, we tried to use "reconfig -member" and provided servers > details but zookeeper ensemble not updating and getting error. created > DigestAuthenticationProvider user to allow reconfig > {code:java} > [zk: zookeeper:2281(CONNECTED) 0] > [zk: zookeeper:2281(CONNECTED) 0] > [zk: zookeeper:2281(CONNECTED) 0] config > server.1=server1zookeeper.svc.cluster.local:2888:3888:participant > server.2=server2zookeeper.svc.cluster.local:2888:3888:participant > server.3=server3zookeeper.svc.cluster.local:2888:3888:participant > version=1700000000 > [zk: zookeeper:2281(CONNECTED) 1] > [zk: zookeeper:2281(CONNECTED) 1] > [zk: zookeeper:2281(CONNECTED) 1] addauth digest zookeeper:admin > [zk: zookeeper:2281(CONNECTED) 2] > [zk: zookeeper:2281(CONNECTED) 2] > [zk: zookeeper:2281(CONNECTED) 2] reconfig -members > server.1=server1zookeeper.svc.cluster.local:2888:3888:participant;0.0.0.0:2181,server.2=server2zookeeper.svc.cluster.local:2888:3888:participant;0.0.0.0:2181,server.3=server3zookeeper.svc.cluster.local:2888:3888:participant;0.0.0.0:2181 > 2021-05-19T08:16:43.376+0000 [myid:zookeeper:2281] - WARN > [main-SendThread(zookeeper:2281):ClientCnxn$SendThread@1242] - Client session > timed out, have not heard from server in 20000ms for session id > 0x30169d99fdf0000 > 2021-05-19T08:16:43.377+0000 [myid:zookeeper:2281] - WARN > [main-SendThread(zookeeper:2281):ClientCnxn$SendThread@1285] - Session > 0x30169d99fdf0000 for sever zookeeper/10.107.240.229:2281, Closing socket > connection. Attempting reconnect except it is a SessionExpiredException. > org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed > out, have not heard from server in 20000ms for session id 0x30169d99fdf0000 > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243) > [zookeeper-3.7.0.jar:3.7.0]WATCHER::WatchedEvent state:Disconnected type:None > path:null > 2021-05-19T08:16:43.390+0000 [myid:] - INFO > [nioEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@469] - channel > is disconnected: [id: 0xa97b55e0, L:/192.168.220.12:47114 ! > R:zookeeper/10.107.240.229:2281] > 2021-05-19T08:16:43.392+0000 [myid:] - INFO > [nioEventLoopGroup-2-1:ClientCnxnSocketNetty@249] - channel is told closing > KeeperErrorCode = ConnectionLoss > {code} > Kindly suggest the way to perform upgrade with desire changes and should also > work with rollback. -- This message was sent by Atlassian Jira (v8.20.1#820001)