Bruno Cadonna created KAFKA-10484:
-------------------------------------
Summary: Reduce Metrics Exposed by Streams
Key: KAFKA-10484
URL: https://issues.apache.org/jira/browse/KAFKA-10484
Project: Kafka
Issue Type: Improvement
Components: streams
Affects Versions: 2.6.0
Reporter: Bruno Cadonna
In our test cluster metrics are monitored through a monitoring service. We
experienced a couple of times that a Kafka Streams client exceeded the limit of
350 metrics of the monitoring service. When the client exceeds the limit,
metrics will be truncated which might result in false alerts. For example, in
our cluster, we monitor the alive stream threads and trigger an alert if a
stream thread dies. It happened that when the client exceeded the 350 metrics
limit, the alive stream threads metric was truncated which led to a false alarm.
The main driver of the high number of metrics are the metrics on task level and
below. An example for those metrics are the state store metrics. The number of
such metrics per Kafka Streams client is hard to predict since it depends on
which tasks are assigned to the client. A stateful task with 5 state stores
reports 5 times more state store metrics than a stateful with only one state
store. Sometimes it is possible to only report the metrics of some state
stores. But sometimes this is not an option. For example, if we want to monitor
the memory usage of RocksDB per Kafka Streams client, we need to report the
memory related metrics of all RocksDB state stores of all tasks assigned to all
stream threads of one client.
One option to reduce the reported metrics is to add a metric that aggregates
some state store metrics, e.g., to monitor memory usage, on client-level within
Kafka Streams.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)