[
https://issues.apache.org/jira/browse/KAFKA-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias J. Sax reassigned KAFKA-7240:
--------------------------------------
Assignee: Sam Lendle (was: John Roesler)
> -total metrics in Streams are incorrect
> ---------------------------------------
>
> Key: KAFKA-7240
> URL: https://issues.apache.org/jira/browse/KAFKA-7240
> Project: Kafka
> Issue Type: Bug
> Components: metrics, streams
> Affects Versions: 2.0.0
> Reporter: Sam Lendle
> Assignee: Sam Lendle
> Priority: Major
>
> I noticed the values of total metrics for streams were decreasing
> periodically when viewed in JMX, for example process-total for each
> processor-node-id under stream-processor-node-metrics.
> Edit: For processor node metrics, I should have been looking at
> ProcessorNode, not StreamsMetricsThreadImpl.
> -Looking at StreamsMetricsThreadImpl, I believe this behavior is due to
> using Count() as the Stat for the *-total metrics. Count() is a SampledStat,
> so the value it reports is the count in recent time windows, and the value
> decreases whenever a window is purged.-
> ----
> -This explains the behavior I saw, but I think the issue is deeper. For
> example, processTimeSensor attempts to measure, process-latency-avg,
> process-latency-max, process-rate, and process-total. For that sensor, record
> is called like-
> -streamsMetrics.processTimeSensor.record(computeLatency() / (double)
> processed, timerStartedMs);-
> -so the value passed to record is average latency per processed message in
> this batch if I understand correctly. That gets pushed through to the call to
> Count#record, which increments it's count by 1, ignoring the value parameter.
> Whatever stat is recording the total would need to know is the number of
> messages processed. Because of that, I don't think it's possible for one
> Sensor to measure both latency and total.-
> -That said, it's not clear to me how all the different Stats work and how
> exactly Sensors work, and I don't actually understand how the process-rate
> metric is working for similar reasons but that seems to be correct, so I may
> be missing something here.-
>
> cc [~guozhang]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)