[ 
https://issues.apache.org/jira/browse/KAFKA-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568557#comment-16568557
 ] 

John Roesler commented on KAFKA-7240:
-------------------------------------

Oh, by all means, go ahead! I'll assign it to you.

 

If you don't want to do a kip, just declare the non-metered count metric 
somewhere in a `streams...internals` package (thus, it wouldn't be a public 
interface change).

 

FYI, we're getting close to merging 
[https://github.com/apache/kafka/pull/5450,] which would affect your changeset, 
so you might want to either watch that PR and wait for the merge, or base your 
change on that branch.

> -total metrics in Streams are incorrect
> ---------------------------------------
>
>                 Key: KAFKA-7240
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7240
>             Project: Kafka
>          Issue Type: Bug
>          Components: metrics, streams
>    Affects Versions: 2.0.0
>            Reporter: Sam Lendle
>            Assignee: John Roesler
>            Priority: Major
>
> I noticed the values of total metrics for streams were decreasing 
> periodically when viewed in JMX, for example process-total for each 
> processor-node-id under stream-processor-node-metrics. 
> Edit: For processor node metrics, I should have been looking at 
> ProcessorNode, not  StreamsMetricsThreadImpl.
>  -Looking at StreamsMetricsThreadImpl, I believe this behavior is due to 
> using Count() as the Stat for the *-total metrics. Count() is a SampledStat, 
> so the value it reports is the count in recent time windows, and the value 
> decreases whenever a window is purged.-
> ----
> -This explains the behavior I saw, but I think the issue is deeper. For 
> example, processTimeSensor attempts to measure, process-latency-avg, 
> process-latency-max, process-rate, and process-total. For that sensor, record 
> is called like-
> -streamsMetrics.processTimeSensor.record(computeLatency() / (double) 
> processed, timerStartedMs);-
>  -so the value passed to record is average latency per processed message in 
> this batch if I understand correctly. That gets pushed through to the call to 
> Count#record, which increments it's count by 1, ignoring the value parameter. 
> Whatever stat is recording the total would need to know is the number of 
> messages processed. Because of that, I don't think it's possible for one 
> Sensor to measure both latency and total.-
> -That said, it's not clear to me how all the different Stats work and how 
> exactly Sensors work, and I don't actually understand how the process-rate 
> metric is working for similar reasons but that seems to be correct, so I may 
> be missing something here.-
>   
> cc [~guozhang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to