[jira] [Created] (KAFKA-20109) Complete Kafka cluster dies on incorrect SSL config of a single controller

Sven Dewit (Jira) Sun, 01 Feb 2026 23:09:27 -0800

Sven Dewit created KAFKA-20109:
----------------------------------

             Summary: Complete Kafka cluster dies on incorrect SSL config of a 
single controller
                 Key: KAFKA-20109
                 URL: https://issues.apache.org/jira/browse/KAFKA-20109
             Project: Kafka
          Issue Type: Bug
          Components: config, controller
    Affects Versions: 4.1.1
         Environment: Debian trixie x86_64, Apache Kafka 3.9.0 - 4.1.1
            Reporter: Sven Dewit
         Attachments: reproduce.tar.gz


Hello,

we've recently run into a bug in Apache Kafka in Kraft modewhere a whole 
mtls-enabled cluster (controllers + brokers) die if a single controller is 
(re)started with bad ssl principal mapping rules.

The bad config of course was appllied unintentionally when doing some changes 
in the config management of the system, basically it led to 
{{ssl.principal.mapping.rules}} missing for the controller listener on that one 
node. As soon as this single controller was restarted, the whole cluster died 
within seconds, both controllers and brokers, with this error message:
{code:java}
ERROR Encountered fatal fault: Unexpected error in raft IO thread 
(org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
org.apache.kafka.common.errors.ClusterAuthorizationException: Received cluster 
authorization error in response InboundResponse(correlationId=493, 
data=BeginQuorumEpochResponseData(errorCode=31, topics=[], nodeEndpoints=[]), 
source=controller-3:9093 (id: 103 rack: null isFenced: false)) {code}
While the missing/bad ssl principal mapping is a major misconfiguration on a 
cluster where in-cluster communication is based on mtls, this still should not 
lead to the whole cluster terminating.

The issue occurred on version 4.1.1 of Apache Kafka, but could be reproduced 
back to 3.9.0.

To reproduce, see the attached tarball containing
 * {{gen-test-ca-and-certs.sh}} to create ca and certificates for brokers and 
controllers to work in mtls mode
 * {{compose.yml}} to spin up the cluster with {{podman compose}}

Once the cluster is running, the following steps reproduce the error:
 * {{podman compose down controller-3}} to stop controller 3
 * uncomment line 53 in {{compose.yml}} to delete controller 3's 
{{ssl.principal.mapping.rules}}
 * {{podman compose up controller-3}} and watch the cluster go down the drain

 

In case I can provide you with any more information or support don't hesitate 
to reach out to me.

 

Best regards,

Sven



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-20109) Complete Kafka cluster dies on incorrect SSL config of a single controller

Reply via email to