Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]
dajac merged PR #15987: URL: https://github.com/apache/kafka/pull/15987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]
chia7712 commented on PR #15987: URL: https://github.com/apache/kafka/pull/15987#issuecomment-2122683021 I will merge it after QA complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]
jeffkbkim commented on code in PR #15987: URL: https://github.com/apache/kafka/pull/15987#discussion_r1607007949 ## group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java: ## @@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) { @Override public CoordinatorEvent take() { -time.sleep(takeDelayMs); -return super.take(); +CoordinatorEvent event = super.take(); Review Comment: @gaurav-narula thanks for the feedback. that makes sense, i have incorporated your suggestion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]
gaurav-narula commented on code in PR #15987: URL: https://github.com/apache/kafka/pull/15987#discussion_r1605528394 ## group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java: ## @@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) { @Override public CoordinatorEvent take() { -time.sleep(takeDelayMs); -return super.take(); +CoordinatorEvent event = super.take(); Review Comment: Nvm, I see your point - `accumulator.take()` may be invoked - the block on the delegate may happen *after* the timer has been incremented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]
gaurav-narula commented on code in PR #15987: URL: https://github.com/apache/kafka/pull/15987#discussion_r1605530429 ## group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java: ## @@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) { @Override public CoordinatorEvent take() { -time.sleep(takeDelayMs); -return super.take(); +CoordinatorEvent event = super.take(); Review Comment: IIUC, it's the ordering that matters so we can avoid the null check and increment time only after `super.take()` returns. ```java CoordinatorEvent event = super.take(); // blocks until there are new events time.sleep(takeDelayMs); // only increment timer after we get unblocked return event; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]
gaurav-narula commented on code in PR #15987: URL: https://github.com/apache/kafka/pull/15987#discussion_r1605530429 ## group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java: ## @@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) { @Override public CoordinatorEvent take() { -time.sleep(takeDelayMs); -return super.take(); +CoordinatorEvent event = super.take(); Review Comment: IIUC, it's the ordering that matters so we can avoid the null check and increment time only after `super.take()` returns. ```java CoordinatorEvent event = super.take(); time.sleep(takeDelayMs); return event; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]
gaurav-narula commented on code in PR #15987: URL: https://github.com/apache/kafka/pull/15987#discussion_r1605528394 ## group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java: ## @@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) { @Override public CoordinatorEvent take() { -time.sleep(takeDelayMs); -return super.take(); +CoordinatorEvent event = super.take(); Review Comment: Nvm, I see your point - `accumulator.take()` may be invoked - the block on the delagate may happen after the timer is incremented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]
gaurav-narula commented on code in PR #15987: URL: https://github.com/apache/kafka/pull/15987#discussion_r1605525333 ## group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java: ## @@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) { @Override public CoordinatorEvent take() { -time.sleep(takeDelayMs); -return super.take(); +CoordinatorEvent event = super.take(); Review Comment: > we may be at the point when we process the 8 events (diff == 800ms) or when the thread does another iteration over handleEvents(), which does another time.sleep() (diff == 900ms). This was causing the flakiness. Sorry, I'm still confused. Why would another iteration over `handleEvents()` invoke the `doAnswer`? In other words, shouldn't `accumulator.take()` block since there are no more events? I'd imagine `EventAccumulator::take` to block on `condition.await()` after 8 events have been processed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]
jeffkbkim commented on code in PR #15987: URL: https://github.com/apache/kafka/pull/15987#discussion_r1605519141 ## group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java: ## @@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) { @Override public CoordinatorEvent take() { -time.sleep(takeDelayMs); -return super.take(); +CoordinatorEvent event = super.take(); Review Comment: That's correct and was my initial confusion as well. The event processor shuts down at the end of the test so we have the processor thread still running when we perform the assertions. When we call ``` long diff = time.milliseconds() - startMs; ``` we may be at the point when we process the 8 events (diff == 800ms) or when the thread does another iteration over handleEvents(), which does another time.sleep() (diff == 900ms). This was causing the flakiness. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]
gaurav-narula commented on code in PR #15987: URL: https://github.com/apache/kafka/pull/15987#discussion_r1605472276 ## group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java: ## @@ -475,9 +478,7 @@ public void testRecordThreadIdleRatio() throws Exception { doAnswer(invocation -> { long threadIdleTime = idleTimeCaptured.getValue(); assertEquals(100, threadIdleTime); -synchronized (recordedIdleTimesMs) { -recordedIdleTimesMs.add(threadIdleTime); -} +recordedIdleTimesMs.add(threadIdleTime); Review Comment: Nit: might be useful to add a comment that this is safe because `numThreads` in line 470 is `1`. ## group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java: ## @@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) { @Override public CoordinatorEvent take() { -time.sleep(takeDelayMs); -return super.take(); +CoordinatorEvent event = super.take(); Review Comment: IIUC, the fix suggests the flakyness is because `super.take()` returns `null` spuriously. I don't quite follow why that would happen though. Javadoc for `EventAccumulator::take` mentions it returns `null` only when the accumulator is closed. We also assert the value captured within doAnswer is `100`, `recordedIdleTimesMs.size() == 8` and `mockRuntimeMetrics.recordThreadIdleTime()` is invoked 8 times so where is the extra invocation coming from? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org