Sandeep created KAFKA-15165:
-------------------------------
Summary: Handle Kafka client certificate failures without
impacting brokers
Key: KAFKA-15165
URL: https://issues.apache.org/jira/browse/KAFKA-15165
Project: Kafka
Issue Type: Improvement
Components: core, security
Affects Versions: 2.8.1
Environment: production
Reporter: Sandeep
Following situation is observed in production:
Consumer or Producer SSL Certificates have expired due to mis-management of
extending the certs. When these clients to connect to either read or publish
messages, they get authentication failures. These clients keep on retrying and
this impacts broker CPUs utilisation, which impacts other healthy clients
connected to brokers.
CPU increase observed from 35% to 85-90%. Clients which are healthy see a spike
in publish and consumer latencies upwards to multiply seconds.
This kind of situation creates a denial of service kind of attack on Kafka
cluster.
We must gracefully handle this, but either:
1) Not allowing clients to connect or retry or do exponential retries after it
fails to authenticate using SSL certs
2) Broker side changes, where it can blacklist clients for certain duration,
which can be overwritten after certs are renewed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)