[ https://issues.apache.org/jira/browse/KAFKA-16305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gaurav Narula updated KAFKA-16305: ---------------------------------- Summary: Optimisation in SslTransportLayer:handshakeUnwrap stalls TLS handshake (was: # Optimisation in SslTransportLayer:handshakeUnwrap stalls TLS handshake) > Optimisation in SslTransportLayer:handshakeUnwrap stalls TLS handshake > ---------------------------------------------------------------------- > > Key: KAFKA-16305 > URL: https://issues.apache.org/jira/browse/KAFKA-16305 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.7.0, 3.6.1 > Reporter: Gaurav Narula > Assignee: Gaurav Narula > Priority: Major > > Kafka allows users to configure custom {{SSLEngine}} via the > {{SslEngineFactory}} interface. There have been attempts to use an OpenSSL > based {{SSLEngine}} using {{netty-handler}} over the JDK based implementation > for performance reasons. > While trying to use a Netty/Openssl based SSLEngine, we observe that the > server hangs while performing the TLS handshake. We observe the following > logs > {code} > 2024-02-26 01:40:00,117 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state NOT_INITIALIZED > 2024-02-26 01:40:00,117 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [INFO ] > Selector - [SocketServer listenerType=ZK_BROKER, nodeId=101] > XX-Gaurav: calling prepare channelId 127.0.0.1:60045-127.0.0.1:60046-0 > 2024-02-26 01:40:00,117 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state NOT_INITIALIZED > 2024-02-26 01:40:00,117 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE > 2024-02-26 01:40:00,117 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] SSLHandshake NEED_UNWRAP channelId > 127.0.0.1:60045-127.0.0.1:60046-0, appReadBuffer pos 0, netReadBuffer pos 0, > netWriteBuffer pos 0 > 2024-02-26 01:40:00,117 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] SSLHandshake handshakeUnwrap channelId > 127.0.0.1:60045-127.0.0.1:60046-0 doRead true > 2024-02-26 01:40:00,118 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] SSLHandshake post handshakeUnwrap: channelId > 127.0.0.1:60045-127.0.0.1:60046-0 handshakeStatus NEED_UNWRAP status > BUFFER_UNDERFLOW read 0 > 2024-02-26 01:40:00,118 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] SSLHandshake post unwrap NEED_UNWRAP channelId > 127.0.0.1:60045-127.0.0.1:60046-0, handshakeResult Status = BUFFER_UNDERFLOW > HandshakeStatus = NEED_UNWRAP bytesConsumed = 0 bytesProduced = 0, > appReadBuffer pos 0, netReadBuffer pos 0, netWriteBuffer pos 0 > handshakeStatus NEED_UNWRAP > 2024-02-26 01:40:00,118 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE > 2024-02-26 01:40:00,118 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE > 2024-02-26 01:40:00,118 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE > 2024-02-26 01:40:00,118 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE > 2024-02-26 01:40:00,118 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE > 2024-02-26 01:40:00,119 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE > 2024-02-26 01:40:00,119 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE > 2024-02-26 01:40:00,120 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [INFO ] > Selector - [SocketServer listenerType=ZK_BROKER, nodeId=101] > XX-Gaurav: calling prepare channelId 127.0.0.1:60045-127.0.0.1:60046-0 > 2024-02-26 01:40:00,120 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE > 2024-02-26 01:40:00,120 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId > 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE > 2024-02-26 01:40:00,120 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] SSLHandshake NEED_UNWRAP channelId > 127.0.0.1:60045-127.0.0.1:60046-0, appReadBuffer pos 0, netReadBuffer pos > 383, netWriteBuffer pos 0 > 2024-02-26 01:40:00,120 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] SSLHandshake handshakeUnwrap channelId > 127.0.0.1:60045-127.0.0.1:60046-0 doRead true > 2024-02-26 01:40:00,122 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] SSLHandshake post handshakeUnwrap: channelId > 127.0.0.1:60045-127.0.0.1:60046-0 handshakeStatus NEED_UNWRAP status OK read 0 > 2024-02-26 01:40:00,122 > data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] > SslTransportLayer - [SslTransportLayer > channelId=127.0.0.1:60045-127.0.0.1:60046-0 > key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 > remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, > interestOps=1, readyOps=0] SSLHandshake post unwrap NEED_UNWRAP channelId > 127.0.0.1:60045-127.0.0.1:60046-0, handshakeResult Status = OK > HandshakeStatus = NEED_TASK bytesConsumed = 383 bytesProduced = 0, > appReadBuffer pos 0, netReadBuffer pos 0, netWriteBuffer pos 0 > handshakeStatus NEED_UNWRAP > {code} > Lines 21 and 22 are of interest here as we observe that: > 1. {{handshakeStatus}} is {{NEED_UNWRAP}} and > 2. {{netReadBuffer.position() == 0}} > The do-while loop in {{handshakeUnwrap()}} is > [exited|https://github.com/apache/kafka/blob/9bc9fae9425e4dac64ef078cd3a4e7e6e09cc45a/clients/src/main/java/org/apache/kafka/common/network/SslTransportLayer.java#L529] > because of (2) and as a result, the {{SSLEngine}}'s state machine isn't > handled properly. > We should therefore, avoid the optimisation and simplify the condition to be > just {{while(cont)}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)