[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487059#comment-17487059
 ] 

Anoop Negi commented on ZOOKEEPER-4440:
---------------------------------------

We are stuck on this, any update on this please

> Zookeeper Upgrade failed when disabling Plain-text communication and ensemble 
> failed to form
> --------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4440
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4440
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.7.0
>         Environment: Kubernetes 1.21.1
>            Reporter: Anoop Negi
>            Priority: Critical
>
> We have three(3) node zookeeper cluster running as a pod on Kubernetes 
> cluster,
> Zookeeper version is 3.7.0, While upgrading zookeeper from Plain-text+Secure 
> mode to only secure mode we are facing issue( i.e. disabling Plain-Text 
> channel)
> 1. To disable plain-text we are removing <clientport> from the dynamic 
> configuration file to enable only secure communication but after upgrade 
> zookeeper ensemble failed to form. leader election continuous failing and 
> getting notification timeout
> {code:java}
> #server configuration
> server.1=server1zookeeper.svc.cluster.local:2888:3888:participant
> server.2=server2zookeeper.svc.cluster.local:2888:3888:participant
> server.3=server3zookeeper.svc.cluster.local:2888:3888:participant
> #secure port enabled
> secureClientPort=2281
> {code}
> {code:java}
>  
> 2021-05-19T08:00:06.900+0000 [myid:] - WARN 
> [QuorumConnectionThread-[myid=3]-3:QuorumCnxManager@400] - Cannot open 
> channel to 1 at election address server1zookeeper/192.168.57.156:3888 
> java.net.SocketTimeoutException: connect timed out at 
> java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?] at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) 
> ~[?:?] at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
>  ~[?:?] at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) 
> ~[?:?] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?] 
> at java.net.Socket.connect(Socket.java:609) ~[?:?] at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:383)
>  [zookeeper-3.7.0.jar:3.7.0] at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
>  [zookeeper-3.7.0.jar:3.7.0] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?] at java.lang.Thread.run(Thread.java:834) [?:?]
> {code}
> {code:java}
> 2021-05-19T07:47:56.894+0000 [myid:] - INFO  
> [QuorumPeer[myid=1](plain=disabled)(secure=0.0.0.0:2281):FastLeaderElection@979]
>  - Notification time out: 60000
> {code}
> 2. We also tried to perform reconfiguration from CLI using zkCli.sh but this 
> also not working, we tried to use "reconfig -member" and provided servers 
> details but zookeeper ensemble not updating and getting error. created 
> DigestAuthenticationProvider user to allow reconfig
> {code:java}
> [zk: zookeeper:2281(CONNECTED) 0]
> [zk: zookeeper:2281(CONNECTED) 0]
> [zk: zookeeper:2281(CONNECTED) 0] config
> server.1=server1zookeeper.svc.cluster.local:2888:3888:participant
> server.2=server2zookeeper.svc.cluster.local:2888:3888:participant
> server.3=server3zookeeper.svc.cluster.local:2888:3888:participant
> version=1700000000
> [zk: zookeeper:2281(CONNECTED) 1]
> [zk: zookeeper:2281(CONNECTED) 1]
> [zk: zookeeper:2281(CONNECTED) 1] addauth digest zookeeper:admin
> [zk: zookeeper:2281(CONNECTED) 2]
> [zk: zookeeper:2281(CONNECTED) 2]
> [zk: zookeeper:2281(CONNECTED) 2] reconfig -members 
> server.1=server1zookeeper.svc.cluster.local:2888:3888:participant;0.0.0.0:2181,server.2=server2zookeeper.svc.cluster.local:2888:3888:participant;0.0.0.0:2181,server.3=server3zookeeper.svc.cluster.local:2888:3888:participant;0.0.0.0:2181
> 2021-05-19T08:16:43.376+0000 [myid:zookeeper:2281] - WARN  
> [main-SendThread(zookeeper:2281):ClientCnxn$SendThread@1242] - Client session 
> timed out, have not heard from server in 20000ms for session id 
> 0x30169d99fdf0000
> 2021-05-19T08:16:43.377+0000 [myid:zookeeper:2281] - WARN  
> [main-SendThread(zookeeper:2281):ClientCnxn$SendThread@1285] - Session 
> 0x30169d99fdf0000 for sever zookeeper/10.107.240.229:2281, Closing socket 
> connection. Attempting reconnect except it is a SessionExpiredException.
> org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed 
> out, have not heard from server in 20000ms for session id 0x30169d99fdf0000
>         at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243) 
> [zookeeper-3.7.0.jar:3.7.0]WATCHER::WatchedEvent state:Disconnected type:None 
> path:null
> 2021-05-19T08:16:43.390+0000 [myid:] - INFO  
> [nioEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@469] - channel 
> is disconnected: [id: 0xa97b55e0, L:/192.168.220.12:47114 ! 
> R:zookeeper/10.107.240.229:2281]
> 2021-05-19T08:16:43.392+0000 [myid:] - INFO  
> [nioEventLoopGroup-2-1:ClientCnxnSocketNetty@249] - channel is told closing
> KeeperErrorCode = ConnectionLoss
> {code}
> Kindly suggest the way to perform upgrade with desire changes and should also 
> work with rollback.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to