Jeetendra N created ZOOKEEPER-4888: -------------------------------------- Summary: Issues with TLS post upgrade from 3.9.1 to 3.9.2 Key: ZOOKEEPER-4888 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4888 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.9.2 Reporter: Jeetendra N Fix For: 3.9.2
We upgraded Zookeeper ensemble from 3.9.1 to 3.9.2. TLS (node-node, client-node) is enabled before upgrade. Everything was working fine before upgrade. Post upgrade -> # Stopped everything (all ZK nodes) # Started all ZK nodes # Checked if SSL is happening between ZK nodes is fine or not # Its confirmed that SSL is working fine between ZK nodes. # Now started just one instance of client application # Post that we see intermittent successful & unsuccessful handshake messages in ZK logs. *ZK server side, we see below messages:* 2024-11-21 13:28:15,586 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.c.X509Util@599] - FIPS mode is ON: selecting standard x509 trust manager com.ibm.jsse2.br@4362299c 2024-11-21 13:28:15,586 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.c.X509Util@644] - Using Java8 optimized cipher suites for Java version 1.8 2024-11-21 13:28:15,588 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxnFactory@596] - SSL handler added for channel: [id: 0x2443db1c, L:/10.1.10.50:2181 - R:/10.1.10.46:57272] 2024-11-21 13:28:15,620 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxnFactory$CertificateVerifier@415] - *Successful handshake with session 0x0* 2024-11-21 13:28:15,620 [myid:] - DEBUG [epollEventLoopGroup-4-9:i.n.h.s.SslHandler@1934] - [id: 0x2443db1c, L:/10.1.10.50:2181 - R:/10.1.10.46:57272] HANDSHAKEN: protocol:TLSv1.3 cipher suite:TLS_AES_256_GCM_SHA384 2024-11-21 13:28:15,622 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxnFactory$CnxnChannelHandler@350] - New message PooledUnsafeDirectByteBuf(ridx: 0, widx: 4, cap: 42) from [id: 0x2443db1c, L:/10.1.10.50:2181 - R:/10.1.10.46:57272] 2024-11-21 13:28:15,622 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@368] - 0x0 queuedBuffer: null 2024-11-21 13:28:15,622 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@386] - not throttled 2024-11-21 13:28:15,623 [myid:] - INFO [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@311] - Processing mntr command from /10.1.10.46:57272 2024-11-21 13:28:15,642 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@113] - close called for session id: 0x0 2024-11-21 13:28:15,642 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@131] - close in progress for session id: 0x0 2024-11-21 13:28:15,644 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@113] - close called for session id: 0x0 2024-11-21 13:28:15,644 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@124] - cnxns size:0 2024-11-21 13:28:17,155 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.c.X509Util@599] - FIPS mode is ON: selecting standard x509 trust manager com.ibm.jsse2.br@a5cca67c 2024-11-21 13:28:17,156 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.c.X509Util@644] - Using Java8 optimized cipher suites for Java version 1.8 2024-11-21 13:28:17,158 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxnFactory@596] - SSL handler added for channel: [id: 0xb818882d, L:/10.1.10.50:2181 - R:/10.1.10.46:57276] 2024-11-21 13:28:17,161 [myid:] - ERROR [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxnFactory$CertificateVerifier@466] - *Unsuccessful handshake with session 0x0* 2024-11-21 13:28:17,161 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@113] - close called for session id: 0x0 2024-11-21 13:28:17,162 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@124] - cnxns size:0 2024-11-21 13:28:17,163 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@113] - close called for session id: 0x0 2024-11-21 13:28:17,163 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@124] - cnxns size:0 *At client side, we see below message intermittently.* 17:37:43.878 [pool-7-thread-1-SendThread(10.1.10.50:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server bdc-dev1807.in.syncsort.dev/10.1.10.50:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. org.apache.zookeeper.ClientCnxn$EndOfStreamException: channel for sessionid 0x0 is lost at org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:287) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1274) *We also see successful SSL connections from client side as well* INFO: Connected via SSL to server : 10.1.10.50 @ port : 2181 Nov 21, 2024 5:46:04 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect INFO: Connected via SSL to server : 10.1.10.46 @ port : 2181 Nov 21, 2024 5:46:09 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect INFO: Connected via SSL to server : 10.1.10.46 @ port : 2181 Nov 21, 2024 5:46:09 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect INFO: Connected via SSL to server : 10.1.10.50 @ port : 2181 Nov 21, 2024 5:46:14 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect INFO: Connected via SSL to server : 10.1.10.46 @ port : 2181 Nov 21, 2024 5:46:14 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect INFO: Connected via SSL to server : 10.1.10.50 @ port : 2181 *We have not set any TLS protocol version or Ciphers at client or server side.* *We are using IBM JDK 8.* Please help troubleshoot this issue -- This message was sent by Atlassian Jira (v8.20.10#820010)