[ https://issues.apache.org/jira/browse/KAFKA-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850322#comment-17850322 ]
Igor Soarez commented on KAFKA-16820: ------------------------------------- It's not clear what the cause of the issue is, nor that it's specific to Kafka. "ISSUE 1" and "ISSUE 2" both involve failing to connect to a controller so probably have the same cause. It may be worth looking into setting up matching IPs and DNS names to the cert SANs, and perhaps double-check the listener configuration for the controllers is setup correctly. If you find there is an issue in Kafka please share the steps to reproduce it. > Kafka Broker fails to connect to Kraft Controller with no DNS matching > ----------------------------------------------------------------------- > > Key: KAFKA-16820 > URL: https://issues.apache.org/jira/browse/KAFKA-16820 > Project: Kafka > Issue Type: Bug > Components: kraft > Affects Versions: 3.7.0, 3.6.1, 3.8.0 > Reporter: Arushi Helms > Priority: Major > Attachments: Screenshot 2024-05-22 at 1.09.11 PM-1.png > > > > We are migrating our Kafka cluster from zookeeper to Kraft mode. We are > running individual brokers and controllers with TLS enabled and IPs are given > for communication. > TLS enabled setup works fine among the brokers and the certificate looks > something like: > {noformat} > Common Name: *.kafka.service.consul > Subject Alternative Names: *.kafka.service.consul, IP > Address:10.87.170.78{noformat} > Note: > * The DNS name for the node does not match the CN but since we are using IPs > as communication, we have provided IPs as SAN. > * Same with the controllers, IPs are given as SAN in the certificate. > * Issue is not related to the migration so just sharing configuration > relevant for the TLS piece. > In the current setup I am running 3 brokers and 3 controllers. > *CONTROLLER:* > Relevant controller configurations from one of the controllers: > {noformat} > KAFKA_CFG_PROCESS_ROLES=controller > KAFKA_KRAFT_CLUSTER_ID=5kztjhJ4SxSu-kdiEYDUow > KAFKA_CFG_NODE_ID=6 > KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=4@10.87.170.83:9097,5@10.87.170.9:9097,6@10.87.170.6:9097 > > KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER > KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INSIDE_SSL > KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:SSL,INSIDE_SSL:SSL > KAFKA_CFG_LISTENERS=CONTROLLER://10.87.170.6:9097{noformat} > Controller certificate has: > {noformat} > Common Name: *.kafka.service.consul > Subject Alternative Names: *.kafka.service.consul, IP > Address:10.87.170.6{noformat} > > *BROKER:* > Relevant broker configuration from one of the brokers: > {noformat} > KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER > KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INSIDE_SSL > KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=4@10.87.170.83:9097,5@10.87.170.9:9097,6@10.87.170.6:9097 > > KAFKA_CFG_PROCESS_ROLES=broker > KAFKA_CFG_NODE_ID=3 > KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INSIDE_SSL:SSL,OUTSIDE_SSL:SSL,CONTROLLER:SSL > > KAFKA_CFG_LISTENERS=INSIDE_SSL://10.87.170.78:9093,OUTSIDE_SSL://10.87.170.78:9096 > > KAFKA_CFG_ADVERTISED_LISTENERS=INSIDE_SSL://10.87.170.78:9093,OUTSIDE_SSL://10.87.170.78:9096{noformat} > Broker certificate has: > {noformat} > Common Name: *.kafka.service.consul > Subject Alternative Names: *.kafka.service.consul, IP > Address:10.87.170.78{noformat} > > ISSUE 1: > With this setup Kafka broker is failing to connect to the controller, see the > following error: > {noformat} > 2024-05-22 17:53:46,413] ERROR > [broker-2-to-controller-heartbeat-channel-manager]: Request > BrokerRegistrationRequestData(brokerId=2, clusterId='5kztjhJ4SxSu-kdiEYDUow', > incarnationId=7741fgH6T4SQqGsho8E6mw, listeners=[Listener(name='INSIDE_SSL', > host='10.87.170.81', port=9093, securityProtocol=1), Listener(name='INSIDE', > host='10.87.170.81', port=9094, securityProtocol=0), Listener(name='OUTSIDE', > host='10.87.170.81', port=9092, securityProtocol=0), > Listener(name='OUTSIDE_SSL', host='10.87.170.81', port=9096, > securityProtocol=1)], features=[Feature(name='metadata.version', > minSupportedVersion=1, maxSupportedVersion=19)], rack=null, > isMigratingZkBroker=false, logDirs=[TJssfKDD-iBFYfIYCKOcew], > previousBrokerEpoch=-1) failed due to authentication error with controller > (kafka.server.NodeToControllerRequestThread)org.apache.kafka.common.errors.SslAuthenticationException: > SSL handshake failedCaused by: javax.net.ssl.SSLHandshakeException: No > subject alternative DNS name matching > cp-internal-onecloud-kfkc1.node.cp-internal-onecloud.consul found. at > java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) at > java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378) > at > java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321) > at > java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316) > at > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1351) > at > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1226) > at > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1169) > at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396) > at > java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480) > at > java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1277) > at > java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1264) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:712) > at > java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1209) > at > org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:435) > at > org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:523) > at > org.apache.kafka.common.network.SslTransportLayer.doHandshake(SslTransportLayer.java:373) > at > org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:293) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:178) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:543) > at org.apache.kafka.common.network.Selector.poll(Selector.java:481) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:585) at > org.apache.kafka.server.util.InterBrokerSendThread.pollOnce(InterBrokerSendThread.java:109) > at > kafka.server.NodeToControllerRequestThread.doWork(NodeToControllerChannelManager.scala:382) > at > org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131)Caused > by: java.security.cert.CertificateException: No subject alternative DNS name > matching cp-internal-onecloud-kfkc1.node.cp-internal-onecloud.consul found. > at > java.base/sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:212) > at > java.base/sun.security.util.HostnameChecker.match(HostnameChecker.java:103) > at > java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:458) > at > java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:418) > at > java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:292) > at > java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:144) > at > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1329) > ... 19 more{noformat} > > ISSUE 2: > Looks like kraft controller does the reverse DNS lookup for itself as well > while starting and we are seeing DNS name matching issue in the controller as > well. Log snippet from Controller with node ID 4: > {noformat} > [2024-05-16 20:57:07,962] INFO [SocketServer listenerType=CONTROLLER, > nodeId=4] Failed authentication with /10.87.170.83 > (channelId=10.87.170.83:9097-10.87.170.83:42548-3) (SSL handshake failed) > (org.apache.kafka.common.network.Selector)[2024-05-16 20:57:11,118] INFO > [ControllerRegistrationManager id=4 incarnation=HWT3UBxJSPGuefZ9xdqH-g] > sendControllerRegistration: attempting to send > ControllerRegistrationRequestData(controllerId=4, > incarnationId=HWT3UBxJSPGuefZ9xdqH-g, zkMigrationReady=true, > listeners=[Listener(name='CONTROLLER', host='10.87.170.83', port=9097, > securityProtocol=1)], features=[Feature(name='metadata.version', > minSupportedVersion=1, maxSupportedVersion=19)]) > (kafka.server.ControllerRegistrationManager)[2024-05-16 20:57:11,129] INFO > [NodeToControllerChannelManager id=4 name=registration] Failed authentication > with cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1/10.87.170.83 > (channelId=4) (SSL handshake failed) > (org.apache.kafka.common.network.Selector)[2024-05-16 20:57:11,130] INFO > [NodeToControllerChannelManager id=4 name=registration] Node 4 disconnected. > (org.apache.kafka.clients.NetworkClient)[2024-05-16 20:57:11,130] INFO > [SocketServer listenerType=CONTROLLER, nodeId=4] Failed authentication with > /10.87.170.83 (channelId=10.87.170.83:9097-10.87.170.83:42564-4) (SSL > handshake failed) (org.apache.kafka.common.network.Selector)[2024-05-16 > 20:57:11,130] ERROR [NodeToControllerChannelManager id=4 name=registration] > Connection to node 4 > (cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1/10.87.170.83:9097) > failed authentication due to: SSL handshake failed > (org.apache.kafka.clients.NetworkClient)[2024-05-16 20:57:11,131] ERROR > [controller-4-to-controller-registration-channel-manager]: Failed to send the > following request due to authentication error: > ClientRequest(expectResponse=true, > callback=kafka.server.NodeToControllerRequestThread$$Lambda$850/0x00007fee184be288@41a1ff51, > destination=4, correlationId=6, clientId=4, createdTimeMs=1715893031119, > requestBuilder=ControllerRegistrationRequestData(controllerId=4, > incarnationId=HWT3UBxJSPGuefZ9xdqH-g, zkMigrationReady=true, > listeners=[Listener(name='CONTROLLER', host='10.87.170.83', port=9097, > securityProtocol=1)], features=[Feature(name='metadata.version', > minSupportedVersion=1, maxSupportedVersion=19)])) > (kafka.server.NodeToControllerRequestThread)[2024-05-16 20:57:11,131] ERROR > [controller-4-to-controller-registration-channel-manager]: Request > ControllerRegistrationRequestData(controllerId=4, > incarnationId=HWT3UBxJSPGuefZ9xdqH-g, zkMigrationReady=true, > listeners=[Listener(name='CONTROLLER', host='10.87.170.83', port=9097, > securityProtocol=1)], features=[Feature(name='metadata.version', > minSupportedVersion=1, maxSupportedVersion=19)]) failed due to authentication > error with controller > (kafka.server.NodeToControllerRequestThread)org.apache.kafka.common.errors.SslAuthenticationException: > SSL handshake failedCaused by: javax.net.ssl.SSLHandshakeException: No > subject alternative DNS name matching > cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1 found. at > java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) at > java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378) > at > java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321) > at > java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316) > at > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1351) > at > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1226) > at > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1169) > at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396) > at > java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480) > at > java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1277) > at > java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1264) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:712) > at > java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1209) > at > org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:435) > at > org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:523) > at > org.apache.kafka.common.network.SslTransportLayer.doHandshake(SslTransportLayer.java:373) > at > org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:293) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:178) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:543) > at org.apache.kafka.common.network.Selector.poll(Selector.java:481) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:585) at > org.apache.kafka.server.util.InterBrokerSendThread.pollOnce(InterBrokerSendThread.java:109) > at > kafka.server.NodeToControllerRequestThread.doWork(NodeToControllerChannelManager.scala:382) > at > org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131)Caused > by: java.security.cert.CertificateException: No subject alternative DNS name > matching cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1 found. at > java.base/sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:212) > at > java.base/sun.security.util.HostnameChecker.match(HostnameChecker.java:103) > at > java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:458) > at > java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:418) > at > java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:292) > at > java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:144) > at > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1329){noformat} > Queries: > 1. Given IPs for communication and IPs as SANs. Why does inter broker > communication works fine but not broker-controller and controller-controller? > 2. Why Is controller doing reverse DNS lookup? Is there a way to disable > that? > Note: we do not wish to set KAFKA_CFG_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM=" > " as it would disable IP matching as well, per our understanding. > Please let me know if you would like to know about any other configuration > and logs. > -- This message was sent by Atlassian Jira (v8.20.10#820010)