Sandeep created KAFKA-15165:
-------------------------------

             Summary: Handle Kafka client certificate failures without 
impacting brokers
                 Key: KAFKA-15165
                 URL: https://issues.apache.org/jira/browse/KAFKA-15165
             Project: Kafka
          Issue Type: Improvement
          Components: core, security
    Affects Versions: 2.8.1
         Environment: production
            Reporter: Sandeep


Following situation is observed in production:

Consumer or Producer SSL Certificates have expired due to mis-management of 
extending the certs. When these clients to connect to either read or publish 
messages, they get authentication failures. These clients keep on retrying and 
this impacts broker CPUs utilisation, which impacts other healthy clients 
connected to brokers.

CPU increase observed from 35% to 85-90%. Clients which are healthy see a spike 
in publish and consumer latencies upwards to multiply seconds.

This kind of situation creates a denial of service kind of attack on Kafka 
cluster.

We must gracefully handle this, but either:

1) Not allowing clients to connect or retry or do exponential retries after it 
fails to authenticate using SSL certs

2) Broker side changes, where it can blacklist clients for certain duration, 
which can be overwritten after certs are renewed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to