Greg Harris created KAFKA-14338:
-----------------------------------

             Summary: Connect RetryUtilTest flakey in CPU-limited environments
                 Key: KAFKA-14338
                 URL: https://issues.apache.org/jira/browse/KAFKA-14338
             Project: Kafka
          Issue Type: Test
          Components: KafkaConnect
            Reporter: Greg Harris
            Assignee: Greg Harris


the RetryUtilTest added alongside the RetryUtil in 
[https://github.com/apache/kafka/pull/11797] has some unresolved flakiness 
issues in CPU restricted environments. I was able to reproduce two flakey 
failures with a 2% CPU throttle in place:
{noformat}
1) testExhaustingRetries(org.apache.kafka.connect.util.RetryUtilTest)
org.junit.runners.model.TestTimedOutException: test timed out after 1000 
milliseconds
        at org.junit.internal.runners.MethodRoadie$1.run(MethodRoadie.java:78)
        at 
org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:97)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.executeTest(PowerMockJUnit44RunnerDelegateImpl.java:310)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTestInSuper(PowerMockJUnit47RunnerDelegateImpl.java:131)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.access$100(PowerMockJUnit47RunnerDelegateImpl.java:59)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner$TestExecutorStatement.evaluate(PowerMockJUnit47RunnerDelegateImpl.java:147)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.evaluateStatement(PowerMockJUnit47RunnerDelegateImpl.java:107)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTest(PowerMockJUnit47RunnerDelegateImpl.java:82)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runBeforesThenTestThenAfters(PowerMockJUnit44RunnerDelegateImpl.java:298)
        at 
org.junit.internal.runners.MethodRoadie.runWithTimeout(MethodRoadie.java:58)
        at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:48)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.invokeTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:218)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.runMethods(PowerMockJUnit44RunnerDelegateImpl.java:160)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$1.run(PowerMockJUnit44RunnerDelegateImpl.java:134)
        at 
org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:34)
        at 
org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:44)
        at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.run(PowerMockJUnit44RunnerDelegateImpl.java:136)
        at 
org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:117)
        at 
org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunner.java:57)
        at 
org.powermock.modules.junit4.PowerMockRunner.run(PowerMockRunner.java:59)
        at org.junit.runners.Suite.runChild(Suite.java:128)
        at org.junit.runners.Suite.runChild(Suite.java:27)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
        at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
        at org.junit.runner.JUnitCore.runMain(JUnitCore.java:77)
        at org.junit.runner.JUnitCore.main(JUnitCore.java:36)
2) retriesEventuallySucceed(org.apache.kafka.connect.util.RetryUtilTest)
org.apache.kafka.connect.errors.ConnectException: Fail to Test after 1 
attempts.  Reason: null
        at 
org.apache.kafka.connect.util.RetryUtil.retryUntilTimeout(RetryUtil.java:101)
        at 
org.apache.kafka.connect.util.RetryUtilTest.retriesEventuallySucceed(RetryUtilTest.java:74)
        ... 38 trimmed
Caused by: org.apache.kafka.common.errors.TimeoutException
{noformat}
Rather than relying on flat timeouts, the test should be written such that 
deadlocks are impossible and the retries proceed deterministically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to