[ https://issues.apache.org/jira/browse/KAFKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Edoardo Comar updated KAFKA-4206: --------------------------------- Description: The current handling of invalid credentials (ie wrong user/password) is to let the {{SaslException}} thrown from an implementation of {{javax.security.sasl.SaslServer.evaluateResponse()}} bubble up the call stack until it gets caught in {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}} where the {{KafkaChannel}} gets closed - which will cause the client that made the request to be disconnected. This will happen however after the server has used considerable resources, especially for the SSL handshake which appears to be computationally expensive in Java. We have observed that if just a few clients keep repeating requests with the wrong credentials, it is quite easy to get all the network processing threads in the Kafka server busy doing SSL handshakes. This makes a Kafka cluster to easily suffer from a Denial Of Service - also non intentional - attack. It can be non intentional, i.e. also caused by friendly clients, for example because a Kafka Java client Producer supplied with the wrong credentials will not throw an exception on publishing, so it may keep attempting to connect without the caller realising. An easy fix which we have implemented and will supply a PR for is to *delay* considerably closing the {{KafkaChannel}} in the {{Selector}}, but obviously without blocking the processing thread. This has been tested to be very effective in reducing the cpu usage spikes caused by non malicious clients using invalid SASL PLAIN credentials over SSL. was: The current handling of invalid credentials (ie wrong user/password) is to let the {{SaslException}} thrown from an implementation of {{javax.security.sasl.SaslServer.evaluateResponse()}} bubble up the call stack until it gets caught in {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}} where the `KafkaChannel` gets closed - which will cause the client that made the request to be disconnected. This will happen however after the server has used considerable resources, especially for the SSL handshake which appears to be computationally expensive in Java. We have observed that if just a few clients keep repeating requests with the wrong credentials, it is quite easy to get all the network processing threads in the Kafka server busy doing SSL handshakes. This makes a Kafka cluster to easily suffer from a Denial Of Service - also non intentional - attack. It can be non intentional, i.e. also caused by friendly clients, for example because a Kafka Java client Producer supplied with the wrong credentials will not throw an exception on publishing, so it may keep attempting to connect without the caller realising. An easy fix which we have implemented and will supply a PR for is to *delay* considerably closing the `KafkaChannel` in the `Selector`, but obviously without blocking the processing thread. This has be tested to be very effective in reducing the cpu usage spikes caused by non malicious ssl clients using invalid credentials. > Improve handling of invalid credentials to mitigate DOS issue (especially on > SSL listeners) > ------------------------------------------------------------------------------------------- > > Key: KAFKA-4206 > URL: https://issues.apache.org/jira/browse/KAFKA-4206 > Project: Kafka > Issue Type: Improvement > Components: network, security > Affects Versions: 0.10.0.0, 0.10.0.1 > Reporter: Edoardo Comar > Assignee: Edoardo Comar > > The current handling of invalid credentials (ie wrong user/password) is to > let the {{SaslException}} thrown from an implementation of > {{javax.security.sasl.SaslServer.evaluateResponse()}} > bubble up the call stack until it gets caught in > {{org.apache.kafka.common.network.Selector.pollSelectionKeys()}} > where the {{KafkaChannel}} gets closed - which will cause the client that > made the request to be disconnected. > This will happen however after the server has used considerable resources, > especially for the SSL handshake which appears to be computationally > expensive in Java. > We have observed that if just a few clients keep repeating requests with the > wrong credentials, it is quite easy to get all the network processing threads > in the Kafka server busy doing SSL handshakes. > This makes a Kafka cluster to easily suffer from a Denial Of Service - also > non intentional - attack. > It can be non intentional, i.e. also caused by friendly clients, for example > because a Kafka Java client Producer supplied with the wrong credentials will > not throw an exception on publishing, so it may keep attempting to connect > without the caller realising. > An easy fix which we have implemented and will supply a PR for is to *delay* > considerably closing the {{KafkaChannel}} in the {{Selector}}, but obviously > without blocking the processing thread. > This has been tested to be very effective in reducing the cpu usage spikes > caused by non malicious clients using invalid SASL PLAIN credentials over SSL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)