[ https://issues.apache.org/jira/browse/KAFKA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324669#comment-15324669 ]
Jay Kreps commented on KAFKA-3811: ---------------------------------- I'm not wild about introducing these levels in an ad hoc way in Kafka Streams. A couple of other options: 1. Make the metrics lower overhead (this is an issue in the producer too). 2. Optimize the usage of metrics in the consumer and streams (i.e. in the producer we increment metrics in batch to avoid locking on each message). 3. Add a general purpose feature to the metrics library and use it across the producer, consumer, and streams. For (3) here is what I am thinking, I think what you are describing is a bit like log4j where there is DEBUG level logging that is cheap or free when you haven't turned it on. Basically what I'm imagining is that there would be a new attribute in org.apache.kafka.common.metrics.Sensor that is something like DEBUG/INFO and then there is a global level that is set (and perhaps can be changed via JMX) and the locking and update of the sensor only happens if the appropriate level or lower is active. Then we would categorize existing metrics with this category through the producer, consumer, and streams. (Arguably this should be at the metric level rather than the sensor level but I'm not sure if it's possible to make that cheap--if so that might be better). > Introduce Kafka Streams metrics recording levels > ------------------------------------------------ > > Key: KAFKA-3811 > URL: https://issues.apache.org/jira/browse/KAFKA-3811 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Greg Fodor > Assignee: aarti gupta > > Follow-up from the discussions here: > https://github.com/apache/kafka/pull/1447 > https://issues.apache.org/jira/browse/KAFKA-3769 > The proposal is to introduce configuration to control the granularity/volumes > of metrics emitted by Kafka Streams jobs, since the per-record level metrics > introduce non-trivial overhead and are possibly less useful once a job has > been optimized. > Proposal from guozhangwang: > level0 (stream thread global): per-record process / punctuate latency, commit > latency, poll latency, etc > level1 (per processor node, and per state store): IO latency, per-record .. > latency, forward throughput, etc. > And by default we only turn on level0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)