[ 
https://issues.apache.org/jira/browse/KAFKA-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-14664:
------------------------------------
    Description: 
The `poll-idle-ratio-avg` metric is intended to track how idle the raft IO 
thread is. When completely idle, it should measure 1. When saturated, it should 
measure 0. The problem with the current measurements is that they are treated 
equally with respect to time. For example, say we poll twice with the following 
durations:

Poll 1: 2s

Poll 2: 0s

Assume that the busy time is negligible, so 2s passes overall.

In the first measurement, 2s is spent waiting, so we compute and record a ratio 
of 1.0. In the second measurement, no time passes, and we record 0.0. The idle 
ratio is then computed as the average of these two values (1.0 + 0.0 / 2 = 
0.5), which suggests that the process was busy for 1s, which overestimates the 
true busy time.

Instead, we should sum up the time waiting over the full interval. 2s passes 
total here and 2s is idle, so we should compute 1.0.

  was:
The `poll-idle-ratio-avg` metric is intended to track how idle the raft IO 
thread is. When completely idle, it should measure 1. When saturated, it should 
measure 0. The problem with the current measurements is that they are treated 
equally with respect to time. For example, say we poll twice with the following 
durations:

Poll 1: 2s

Poll 2: 0s

Assume that the busy time is negligible, so 2s passes overall.

In the first measurement, 2s is spent waiting, so we compute and record a ratio 
of 1.0. In the second measurement, no time passes, and we record 0.0. The idle 
ratio is then computed as the average of these two values (1.0 + 0.0 / 2 = 
0.5), which suggests that the process was busy for 1s. 

Instead, we should sum up the time waiting over the full interval. 2s passes 
total here and 2s is idle, so we should compute 1.0.


> Raft idle ratio is inaccurate
> -----------------------------
>
>                 Key: KAFKA-14664
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14664
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>
> The `poll-idle-ratio-avg` metric is intended to track how idle the raft IO 
> thread is. When completely idle, it should measure 1. When saturated, it 
> should measure 0. The problem with the current measurements is that they are 
> treated equally with respect to time. For example, say we poll twice with the 
> following durations:
> Poll 1: 2s
> Poll 2: 0s
> Assume that the busy time is negligible, so 2s passes overall.
> In the first measurement, 2s is spent waiting, so we compute and record a 
> ratio of 1.0. In the second measurement, no time passes, and we record 0.0. 
> The idle ratio is then computed as the average of these two values (1.0 + 0.0 
> / 2 = 0.5), which suggests that the process was busy for 1s, which 
> overestimates the true busy time.
> Instead, we should sum up the time waiting over the full interval. 2s passes 
> total here and 2s is idle, so we should compute 1.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to