Re: [PR] KAFKA-17503: Fix incorrect calculation of poll-idle-ratio-avg for con… [kafka]

via GitHub Wed, 18 Dec 2024 04:01:57 -0800


srdo commented on code in PR #17139:
URL: https://github.com/apache/kafka/pull/17139#discussion_r1890122551



##########
clients/src/test/java/org/apache/kafka/clients/consumer/KafkaConsumerTest.java:
##########
@@ -3098,31 +3098,53 @@ public void testPollIdleRatio(GroupProtocol 
groupProtocol) {
         assertEquals(Double.NaN, 
consumer.metrics().get(pollIdleRatio).metricValue());
 
         // 1st poll
-        // Spend 50ms in poll so value = 1.0
+        // Spend 50ms in poll. value=NaN because "the fraction of time spent 
inside poll" is undefined until the polling interval has an end point,
+        // which is only known once the second poll starts.
+        // This also means the metric will always exclude the latest poll, 
since we don't know how much time is spent outside poll for that interval
+        // until poll is called again

Review Comment:
   I can change this, but the consequence is that we will exclude the first 
poll rather than the last poll.
   
   The issue is essentially that this metric is measuring a property of the 
interval between two polls, which means you're going to have to either exclude 
either the first or the last poll depending on whether you choose to start 
intervals at the start or end of the poll.
   
   If you define the poll interval as being from start to start (as I did 
here), that means the last poll is excluded because until the poll after that 
begins, the interval for the last poll has no endpoint.
   
   If you define the poll interval as being from end to end (as you suggest), 
the first poll is excluded, because there is no starting point for that 
interval.
   
   Here's a small illustration:
   
   Picking the start time of the poll call as the boundary of the interval, the 
poll intervals for 3 sequential completed polls look like this:
   
   ```
   [start1, start2], [start2, start3], [start3, ?].
   ```
   By using these boundaries, each poll interval starts when poll is called, 
and ends when the subsequent poll is called. The time you spent outside of poll 
in an interval is the time _after_ poll returns.
   
   The third interval has an unknown boundary because that boundary point will 
be the start time of the fourth poll, and that poll hasn't started yet. This 
means we have to wait to record that last interval until then, which means the 
last poll is not included in the metric.
   
   If we instead pick the end time as the boundary, we get these intervals:
   
   ```
   [?, end1], [end1, end2], [end2, end3]
   ```
   By using these boundaries, each poll interval starts when the previous poll 
returns, and ends when the current poll returns. The time you spent outside of 
poll in a specific interval is the time _before_ you called poll.
   
   The first interval has an unknown boundary, because there is no "previous 
poll" to get an end point from, and so that interval will be excluded from the 
metric.
   
   I don't know which is preferable, they seem pretty equal to me. Do you have 
an opinion @ahuang98?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-17503: Fix incorrect calculation of poll-idle-ratio-avg for con… [kafka]

Reply via email to