[ https://issues.apache.org/jira/browse/KAFKA-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Chen updated KAFKA-13474: ------------------------------ Fix Version/s: 3.2.1 > Regression in dynamic update of broker certificate > -------------------------------------------------- > > Key: KAFKA-13474 > URL: https://issues.apache.org/jira/browse/KAFKA-13474 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.7.0, 3.1.0, 2.7.2, 2.8.1, 3.0.0, 3.2.0, 3.1.1, 3.2.1 > Reporter: Igor Shipenkov > Assignee: Divij Vaidya > Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: failed-controller-single-session-20211119.pcap.gz > > > h1. Problem > It seems, after updating listener SSL certificate with dynamic broker > configuration update, old certificate is somehow still used for broker client > SSL factory. Because of this broker fails to create new connection to > controller after old certificate expires. > h1. History > Back in KAFKA-8336 there was an issue, when client-side SSL factory wasn't > updating certificate, when it was changed with dynamic configuration. That > bug have been fixed in version 2.3 and I can confirm, that dynamic update > worked for us with kafka 2.4. But now we have updated clusters to 2.7 and see > this (or at least similar) problem again. > h1. Affected versions > First we've seen this on confluent 6.1.2, which (I think) based on kafka > 2.7.0. Then I tried vanilla versions 2.7.0 and 2.7.2 and can reproduce > problem on them just fine > h1. How to reproduce > * Have zookeeper somewhere (in my example it will be "10.88.0.21:2181"). > * Get vanilla version 2.7.2 (or 2.7.0) from > [https://kafka.apache.org/downloads] . > * Make basic broker config like this (don't forget to actually create > log.dirs): > {code:none} > broker.id=1 > listeners=SSL://:9092 > advertised.listeners=SSL://localhost:9092 > log.dirs=/tmp/broker1/data > zookeeper.connect=10.88.0.21:2181 > security.inter.broker.protocol=SSL > ssl.protocol=TLSv1.2 > ssl.client.auth=required > ssl.endpoint.identification.algorithm= > ssl.keystore.type=PKCS12 > ssl.keystore.location=/tmp/broker1/secrets/broker1.keystore.p12 > ssl.keystore.password=changeme1 > ssl.key.password=changeme1 > ssl.truststore.type=PKCS12 > ssl.truststore.location=/tmp/broker1/secrets/truststore.p12 > ssl.truststore.password=changeme > {code} > (I use here TLS 1.2 just so I can see client certificate in TLS handshake in > traffic dump, you will get same error with default TLS 1.3 too) > ** Repeat this config for another 2 brokers, changing id, listener port and > certificate accordingly. > * Make basic client config (I use for it one of brokers' certificates): > {code:none} > security.protocol=SSL > ssl.key.password=changeme1 > ssl.keystore.type=PKCS12 > ssl.keystore.location=/tmp/broker1/secrets/broker1.keystore.p12 > ssl.keystore.password=changeme1 > ssl.truststore.type=PKCS12 > ssl.truststore.location=/tmp/broker1/secrets/truststore.p12 > ssl.truststore.password=changeme > ssl.endpoint.identification.algorithm= > {code} > * Create usual local self-signed PKI for test > ** generate self-signed CA certificate and private key. Place certificate in > truststore. > ** create keys for broker certificates and create requests from them as > usual (I'll use here same subject for all brokers) > ** create 2 certificates as usual > {code:bash} > openssl x509 \ > -req -CAcreateserial -days 1 \ > -CA ca/ca-cert.pem -CAkey ca/ca-key.pem \ > -in broker1.csr -out broker1.crt > {code} > ** Use "faketime" utility to make third certificate expire soon: > {code:bash} > # date here is some point yesterday, so certificate will expire like 10-15 > minutes from now > faketime "2021-11-23 10:15" openssl x509 \ > -req -CAcreateserial -days 1 \ > -CA ca/ca-cert.pem -CAkey ca/ca-key.pem \ > -in broker2.csr -out broker2.crt > {code} > ** create keystores from certificates and place them according to broker > configs from earlier > * Run 3 brokers with your configs like > {code:bash} > ./bin/kafka-server-start.sh server2.properties > {code} > (I start it here without daemon mode to see logs right on terminal - just use > "tmux" or something to run 3 brokers simultaneously) > ** you can check that one broker certificate will expire soon with > {code:bash} > openssl s_client -connect localhost:9093 </dev/null | openssl x509 -noout > -text | grep -A2 Valid > {code} > * Issue new certificate to replace one, which will expire soon, place it in > keystore and replace old keystore with it. > * Use dynamic configuration to make broker re-read keystore: > {code:bash} > ./bin/kafka-configs --command-config ssl.properties --bootstrap-server > localhost:9092 --entity-type brokers --entity-name "2" --alter --add-config > "listener.name.SSL.ssl.keystore.location=/tmp/broker2/secrets/broker2.keystore.p12" > {code} > ** You can check that broker now has new certificate on its listener with > same command > {code:bash} > openssl s_client -connect localhost:9093 </dev/null | openssl x509 -noout > -text | grep -A2 Valid > {code} > * Wait until that old certificate expires and make some changes, which > provoke broker to make new controller connection. For example if I have > controller on broker "1" and expired certificate was on broker "2", then I > restart broker "3". > * On broker with expired certificate you will see in log something like > {code:none} > INFO [broker-2-to-controller-send-thread]: Recorded new controller, from now > on will use broker 1 (kafka.server.BrokerToControllerRequestThread) > INFO [broker-2-to-controller] Failed authentication with localhost/127.0.0.1 > (SSL handshake failed) (org.apache.kafka.common.network.Selector) > ERROR [broker-2-to-controller] Connection to node 1 > (localhost/127.0.0.1:9092) failed authentication due to: SSL handshake failed > (org.apache.kafka.clients.NetworkClient) > ERROR [broker-2-to-controller-send-thread]: Failed to send the following > request due to authentication error: ClientRequest(expectResponse=true, > callback=kafka.server.BrokerToControllerRequestThread$$Lambda$996/0x0000000801724c40@4d3e77ce, > destination=1, correlationId=626, clientId=2, createdTimeMs=1637718291682, > requestBuilder=AlterIsrRequestData(brokerId=2, brokerEpoch=293, topics=<some > topic topology> kafka.server.BrokerToControllerRequestThread) > {code} > and controller log will show something like > {code:none} > INFO [SocketServer brokerId=1] Failed authentication with /127.0.0.1 (SSL > handshake failed) (org.apache.kafka.common.network.Selector) > {code} > and if broker with expired and changed certificate was controller itself, > then it even could not connect to itself. > * If you make traffic dump (and you use TLS 1.2 or less) then you will see > that broker client connection tries to use old certificate in TLS handshake. > Here is example of traffic dump, when broker with expired and dynamically > changed certificate is current controller, so it can't connect to itself: > [^failed-controller-single-session-20211119.pcap.gz] > In this example you will see that "Server" use new certificate and "Client" > use old certificate, but it's same broker! -- This message was sent by Atlassian Jira (v8.20.10#820010)