Bruno Cadonna created KAFKA-10484:
-------------------------------------

             Summary: Reduce Metrics Exposed by Streams
                 Key: KAFKA-10484
                 URL: https://issues.apache.org/jira/browse/KAFKA-10484
             Project: Kafka
          Issue Type: Improvement
          Components: streams
    Affects Versions: 2.6.0
            Reporter: Bruno Cadonna


In our test cluster metrics are monitored through a monitoring service. We 
experienced a couple of times that a Kafka Streams client exceeded the limit of 
350 metrics of the monitoring service. When the client exceeds the limit, 
metrics will be truncated which might result in false alerts. For example, in 
our cluster, we monitor the alive stream threads and trigger an alert if a 
stream thread dies. It happened that when the client exceeded the 350 metrics 
limit, the alive stream threads metric was truncated which led to a false alarm.

The main driver of the high number of metrics are the metrics on task level and 
below. An example for those metrics are the state store metrics. The number of 
such metrics per Kafka Streams client is hard to predict since it depends on 
which tasks are assigned to the client. A stateful task with 5 state stores 
reports 5 times more state store metrics than a stateful with only one state 
store. Sometimes it is possible to only report the metrics of some state 
stores. But sometimes this is not an option. For example, if we want to monitor 
the memory usage of RocksDB per Kafka Streams client, we need to report the 
memory related metrics of all RocksDB state stores of all tasks assigned to all 
stream threads of one client.

One option to reduce the reported metrics is to add a metric that aggregates 
some state store metrics, e.g., to monitor memory usage, on client-level within 
Kafka Streams.       



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to