Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]

2024-05-23 Thread via GitHub


dajac merged PR #15987:
URL: https://github.com/apache/kafka/pull/15987


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]

2024-05-21 Thread via GitHub


chia7712 commented on PR #15987:
URL: https://github.com/apache/kafka/pull/15987#issuecomment-2122683021

   I will merge it after QA complete


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]

2024-05-20 Thread via GitHub


jeffkbkim commented on code in PR #15987:
URL: https://github.com/apache/kafka/pull/15987#discussion_r1607007949


##
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java:
##
@@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) {
 
 @Override
 public CoordinatorEvent take() {
-time.sleep(takeDelayMs);
-return super.take();
+CoordinatorEvent event = super.take();

Review Comment:
   @gaurav-narula thanks for the feedback. that makes sense, i have 
incorporated your suggestion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]

2024-05-17 Thread via GitHub


gaurav-narula commented on code in PR #15987:
URL: https://github.com/apache/kafka/pull/15987#discussion_r1605528394


##
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java:
##
@@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) {
 
 @Override
 public CoordinatorEvent take() {
-time.sleep(takeDelayMs);
-return super.take();
+CoordinatorEvent event = super.take();

Review Comment:
   Nvm, I see your point - `accumulator.take()` may be invoked - the block on 
the delegate may happen *after* the timer has been incremented.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]

2024-05-17 Thread via GitHub


gaurav-narula commented on code in PR #15987:
URL: https://github.com/apache/kafka/pull/15987#discussion_r1605530429


##
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java:
##
@@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) {
 
 @Override
 public CoordinatorEvent take() {
-time.sleep(takeDelayMs);
-return super.take();
+CoordinatorEvent event = super.take();

Review Comment:
   IIUC, it's the ordering that matters so we can avoid the null check and 
increment time only after `super.take()` returns.
   
   ```java
   CoordinatorEvent event = super.take(); // blocks until there are 
new events
   time.sleep(takeDelayMs); // only increment timer after we get 
unblocked
   return event;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]

2024-05-17 Thread via GitHub


gaurav-narula commented on code in PR #15987:
URL: https://github.com/apache/kafka/pull/15987#discussion_r1605530429


##
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java:
##
@@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) {
 
 @Override
 public CoordinatorEvent take() {
-time.sleep(takeDelayMs);
-return super.take();
+CoordinatorEvent event = super.take();

Review Comment:
   IIUC, it's the ordering that matters so we can avoid the null check and 
increment time only after `super.take()` returns.
   
   ```java
   CoordinatorEvent event = super.take();
   time.sleep(takeDelayMs);
   return event;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]

2024-05-17 Thread via GitHub


gaurav-narula commented on code in PR #15987:
URL: https://github.com/apache/kafka/pull/15987#discussion_r1605528394


##
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java:
##
@@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) {
 
 @Override
 public CoordinatorEvent take() {
-time.sleep(takeDelayMs);
-return super.take();
+CoordinatorEvent event = super.take();

Review Comment:
   Nvm, I see your point - `accumulator.take()` may be invoked - the block on 
the delagate may happen after the timer is incremented.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]

2024-05-17 Thread via GitHub


gaurav-narula commented on code in PR #15987:
URL: https://github.com/apache/kafka/pull/15987#discussion_r1605525333


##
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java:
##
@@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) {
 
 @Override
 public CoordinatorEvent take() {
-time.sleep(takeDelayMs);
-return super.take();
+CoordinatorEvent event = super.take();

Review Comment:
   > we may be at the point when we process the 8 events (diff == 800ms) or 
when the thread does another iteration over handleEvents(), which does another 
time.sleep() (diff == 900ms). This was causing the flakiness.
   
   Sorry, I'm still confused. Why would another iteration over `handleEvents()` 
invoke the `doAnswer`? In other words, shouldn't `accumulator.take()` block 
since there are no more events? I'd imagine `EventAccumulator::take` to block 
on `condition.await()` after 8 events have been processed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]

2024-05-17 Thread via GitHub


jeffkbkim commented on code in PR #15987:
URL: https://github.com/apache/kafka/pull/15987#discussion_r1605519141


##
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java:
##
@@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) {
 
 @Override
 public CoordinatorEvent take() {
-time.sleep(takeDelayMs);
-return super.take();
+CoordinatorEvent event = super.take();

Review Comment:
   That's correct and was my initial confusion as well. The event processor 
shuts down at the end of the test so we have the processor thread still running 
when we perform the assertions.
   
   When we call
   ```
   long diff = time.milliseconds() - startMs;
   ```
   we may be at the point when we process the 8 events (diff == 800ms) or when 
the thread does another iteration over handleEvents(), which does another 
time.sleep() (diff == 900ms). This was causing the flakiness.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: fix flaky testRecordThreadIdleRatio [kafka]

2024-05-17 Thread via GitHub


gaurav-narula commented on code in PR #15987:
URL: https://github.com/apache/kafka/pull/15987#discussion_r1605472276


##
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java:
##
@@ -475,9 +478,7 @@ public void testRecordThreadIdleRatio() throws Exception {
 doAnswer(invocation -> {
 long threadIdleTime = idleTimeCaptured.getValue();
 assertEquals(100, threadIdleTime);
-synchronized (recordedIdleTimesMs) {
-recordedIdleTimesMs.add(threadIdleTime);
-}
+recordedIdleTimesMs.add(threadIdleTime);

Review Comment:
   Nit: might be useful to add a comment that this is safe because `numThreads` 
in line 470 is `1`.



##
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessorTest.java:
##
@@ -61,8 +61,11 @@ public DelayEventAccumulator(Time time, long takeDelayMs) {
 
 @Override
 public CoordinatorEvent take() {
-time.sleep(takeDelayMs);
-return super.take();
+CoordinatorEvent event = super.take();

Review Comment:
   IIUC, the fix suggests the flakyness is because `super.take()` returns 
`null` spuriously. I don't quite follow why that would happen though. Javadoc 
for `EventAccumulator::take` mentions it returns `null` only when the 
accumulator is closed. We also assert the value captured within doAnswer is 
`100`, `recordedIdleTimesMs.size() == 8` and 
`mockRuntimeMetrics.recordThreadIdleTime()` is invoked 8 times so where is the 
extra invocation coming from?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org