Arushi Helms created KAFKA-16820:
------------------------------------
Summary: Kafka Broker fails to connect to Kraft Controller with no
DNS matching
Key: KAFKA-16820
URL: https://issues.apache.org/jira/browse/KAFKA-16820
Project: Kafka
Issue Type: Bug
Components: kraft
Affects Versions: 3.6.1, 3.7.0, 3.8.0
Reporter: Arushi Helms
Attachments: Screenshot 2024-05-22 at 1.09.11 PM-1.png
We are migrating our Kafka cluster from zookeeper to Kraft mode. We are running
individual brokers and controllers with TLS enabled and IPs are given for
communication.
TLS enabled setup works fine among the brokers and the certificate looks
something like:
h5.
{noformat}
Common Name: *.kafka.service.consul
Subject Alternative Names: *.kafka.service.consul, IP
Address:10.87.171.84{noformat}
Note: The DNS name for the node does not match the CN but since we are using
IPs as communication, we have provided IPs as SAN.
Same with the controllers, IPs are given as SAN in the certificate.
In the current setup I am running 3 brokers and 3 controllers.
Relevant controller configurations from one of the controllers:
{{}}
{noformat}
KAFKA_CFG_PROCESS_ROLES=controller
KAFKA_KRAFT_CLUSTER_ID=5kztjhJ4SxSu-kdiEYDUow
KAFKA_CFG_NODE_ID=6
[email protected]:9097,[email protected]:9097,[email protected]:9097
KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INSIDE_SSL
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:SSL,INSIDE_SSL:SSL
KAFKA_CFG_LISTENERS=CONTROLLER://10.87.170.6:9097{noformat}
{{}}
Relevant broker configuration from one of the brokers:
{noformat}
KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INSIDE_SSL
[email protected]:9097,[email protected]:9097,[email protected]:9097
KAFKA_CFG_PROCESS_ROLES=broker
KAFKA_CFG_NODE_ID=3
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INSIDE_SSL:SSL,OUTSIDE_SSL:SSL,CONTROLLER:SSL
KAFKA_CFG_LISTENERS=INSIDE_SSL://10.87.170.78:9093,OUTSIDE_SSL://10.87.170.78:9096
KAFKA_CFG_ADVERTISED_LISTENERS=INSIDE_SSL://10.87.170.78:9093,OUTSIDE_SSL://10.87.170.78:9096{noformat}
{{}}
ISSUE 1:
With this setup Kafka broker is failing to connect to the controller, see the
following error:
{noformat}
2024-05-22 17:53:46,413] ERROR
[broker-2-to-controller-heartbeat-channel-manager]: Request
BrokerRegistrationRequestData(brokerId=2, clusterId='5kztjhJ4SxSu-kdiEYDUow',
incarnationId=7741fgH6T4SQqGsho8E6mw, listeners=[Listener(name='INSIDE_SSL',
host='10.87.170.81', port=9093, securityProtocol=1), Listener(name='INSIDE',
host='10.87.170.81', port=9094, securityProtocol=0), Listener(name='OUTSIDE',
host='10.87.170.81', port=9092, securityProtocol=0),
Listener(name='OUTSIDE_SSL', host='10.87.170.81', port=9096,
securityProtocol=1)], features=[Feature(name='metadata.version',
minSupportedVersion=1, maxSupportedVersion=19)], rack=null,
isMigratingZkBroker=false, logDirs=[TJssfKDD-iBFYfIYCKOcew],
previousBrokerEpoch=-1) failed due to authentication error with controller
(kafka.server.NodeToControllerRequestThread)org.apache.kafka.common.errors.SslAuthenticationException:
SSL handshake failedCaused by: javax.net.ssl.SSLHandshakeException: No subject
alternative DNS name matching
cp-internal-onecloud-kfkc1.node.cp-internal-onecloud.consul found. at
java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378) at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321) at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316) at
java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1351)
at
java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1226)
at
java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1169)
at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396)
at
java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480)
at
java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1277)
at
java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1264)
at
java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at
java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1209)
at
org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:435)
at
org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:523)
at
org.apache.kafka.common.network.SslTransportLayer.doHandshake(SslTransportLayer.java:373)
at
org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:293)
at
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:178) at
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:543)
at org.apache.kafka.common.network.Selector.poll(Selector.java:481) at
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:585) at
org.apache.kafka.server.util.InterBrokerSendThread.pollOnce(InterBrokerSendThread.java:109)
at
kafka.server.NodeToControllerRequestThread.doWork(NodeToControllerChannelManager.scala:382)
at
org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131)Caused
by: java.security.cert.CertificateException: No subject alternative DNS name
matching cp-internal-onecloud-kfkc1.node.cp-internal-onecloud.consul found.
at
java.base/sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:212)
at
java.base/sun.security.util.HostnameChecker.match(HostnameChecker.java:103) at
java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:458)
at
java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:418)
at
java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:292)
at
java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:144)
at
java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1329)
... 19 more{noformat}
ISSUE 2:
Looks like kraft controller does the reverse DNS lookup for itself as well
while starting and we are seeing DNS name matching issue in the controller as
well. Log snippet from Controller with node ID 4:
{noformat}
[2024-05-16 20:57:07,962] INFO [SocketServer listenerType=CONTROLLER, nodeId=4]
Failed authentication with /10.87.170.83
(channelId=10.87.170.83:9097-10.87.170.83:42548-3) (SSL handshake failed)
(org.apache.kafka.common.network.Selector)[2024-05-16 20:57:11,118] INFO
[ControllerRegistrationManager id=4 incarnation=HWT3UBxJSPGuefZ9xdqH-g]
sendControllerRegistration: attempting to send
ControllerRegistrationRequestData(controllerId=4,
incarnationId=HWT3UBxJSPGuefZ9xdqH-g, zkMigrationReady=true,
listeners=[Listener(name='CONTROLLER', host='10.87.170.83', port=9097,
securityProtocol=1)], features=[Feature(name='metadata.version',
minSupportedVersion=1, maxSupportedVersion=19)])
(kafka.server.ControllerRegistrationManager)[2024-05-16 20:57:11,129] INFO
[NodeToControllerChannelManager id=4 name=registration] Failed authentication
with cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1/10.87.170.83
(channelId=4) (SSL handshake failed)
(org.apache.kafka.common.network.Selector)[2024-05-16 20:57:11,130] INFO
[NodeToControllerChannelManager id=4 name=registration] Node 4 disconnected.
(org.apache.kafka.clients.NetworkClient)[2024-05-16 20:57:11,130] INFO
[SocketServer listenerType=CONTROLLER, nodeId=4] Failed authentication with
/10.87.170.83 (channelId=10.87.170.83:9097-10.87.170.83:42564-4) (SSL handshake
failed) (org.apache.kafka.common.network.Selector)[2024-05-16 20:57:11,130]
ERROR [NodeToControllerChannelManager id=4 name=registration] Connection to
node 4
(cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1/10.87.170.83:9097)
failed authentication due to: SSL handshake failed
(org.apache.kafka.clients.NetworkClient)[2024-05-16 20:57:11,131] ERROR
[controller-4-to-controller-registration-channel-manager]: Failed to send the
following request due to authentication error:
ClientRequest(expectResponse=true,
callback=kafka.server.NodeToControllerRequestThread$$Lambda$850/0x00007fee184be288@41a1ff51,
destination=4, correlationId=6, clientId=4, createdTimeMs=1715893031119,
requestBuilder=ControllerRegistrationRequestData(controllerId=4,
incarnationId=HWT3UBxJSPGuefZ9xdqH-g, zkMigrationReady=true,
listeners=[Listener(name='CONTROLLER', host='10.87.170.83', port=9097,
securityProtocol=1)], features=[Feature(name='metadata.version',
minSupportedVersion=1, maxSupportedVersion=19)]))
(kafka.server.NodeToControllerRequestThread)[2024-05-16 20:57:11,131] ERROR
[controller-4-to-controller-registration-channel-manager]: Request
ControllerRegistrationRequestData(controllerId=4,
incarnationId=HWT3UBxJSPGuefZ9xdqH-g, zkMigrationReady=true,
listeners=[Listener(name='CONTROLLER', host='10.87.170.83', port=9097,
securityProtocol=1)], features=[Feature(name='metadata.version',
minSupportedVersion=1, maxSupportedVersion=19)]) failed due to authentication
error with controller
(kafka.server.NodeToControllerRequestThread)org.apache.kafka.common.errors.SslAuthenticationException:
SSL handshake failedCaused by: javax.net.ssl.SSLHandshakeException: No subject
alternative DNS name matching
cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1 found. at
java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378) at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321) at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316) at
java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1351)
at
java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1226)
at
java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1169)
at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396)
at
java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480)
at
java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1277)
at
java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1264)
at
java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at
java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1209)
at
org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:435)
at
org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:523)
at
org.apache.kafka.common.network.SslTransportLayer.doHandshake(SslTransportLayer.java:373)
at
org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:293)
at
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:178) at
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:543)
at org.apache.kafka.common.network.Selector.poll(Selector.java:481) at
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:585) at
org.apache.kafka.server.util.InterBrokerSendThread.pollOnce(InterBrokerSendThread.java:109)
at
kafka.server.NodeToControllerRequestThread.doWork(NodeToControllerChannelManager.scala:382)
at
org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131)Caused
by: java.security.cert.CertificateException: No subject alternative DNS name
matching cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1 found. at
java.base/sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:212)
at
java.base/sun.security.util.HostnameChecker.match(HostnameChecker.java:103) at
java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:458)
at
java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:418)
at
java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:292)
at
java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:144)
at
java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1329){noformat}
Queries:
1. Given IPs for communication and IPs as SANs. Why does inter broker
communication works fine but not broker-controller and controller-controller?
2. Why Is controller doing reverse DNS lookup? Is there a way to disable that?
Note: we do not wish to set KAFKA_CFG_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM=" "
as it would disable IP matching as well, per our understanding.
Please let me know if you would like to know about any other configuration and
logs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)