Martin Schneppenheim created KAFKA-12797:
--------------------------------------------
Summary: Quota to mitigate impact of clients that leak Fetch
session slots
Key: KAFKA-12797
URL: https://issues.apache.org/jira/browse/KAFKA-12797
Project: Kafka
Issue Type: New Feature
Affects Versions: 2.8.0
Reporter: Martin Schneppenheim
*Motivation*
KIP-227 introduced fetch sessions and therefore also a fetch session cache that
is maintained per Broker and is limited to 1k by default. Accordingly the fetch
session slots cache is shared among all clients.
In a multi tenant environment with hundreds or thousands of different clients
misbehaving clients (e.g. Sarama v1.26.0) may leak fetch sessions excessively.
This can lead to high eviction rates of fetch sessions at the broker side.
Other clients will likely be impacted by this becasue their fetch session can
no longer be found in the fetch session cache; in practice log messages like
these will pop up:
{noformat}
Node <number> was unable to process the fetch request with
(sessionId=<some-number>, epoch=<some-other-number>):
FETCH_SESSION_ID_NOT_FOUND.{noformat}
As an operator I don't know how I could identify clients / sasl users that use
the most sessions, nor do I have an option to mitigate the impact of clients
that create many fetch sessions. The absence of a quota can be exploited by
attackers in untrusted multi tenant environments.
*Proposal*
While I'm not really familiar with the Kafka code I assume that a new quota
that limits how many fetch session slots a client can maintain (or create in a
certain time window) could be introduced.
Additionally I believe that it would be a nice-to-have to monitor the number of
fetch session slots created/maintained per SASL user (and/or) ClientID. This
way operators can inform misbehaving clients about the problem with fetch
sessions which are likely caused by improper client implementations.
cc [~dajac] [~gwenshap]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)