EdwinIngJ opened a new issue, #18611:
URL: https://github.com/apache/druid/issues/18611

   I noticed some nondeterminism in the following tests:
   - 
`org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testShardSplit`
 
   - 
`org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testMultiTask`.
 
   
   Specifically, the problem in both tests comes from the tests assuming some 
ordering for the captured tasks.
   ### For 
`org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testShardSplit`
   ```
   [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
0.696 s <<< FAILURE! -- in 
org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest
   [ERROR] 
org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testShardSplit
 -- Time elapsed: 0.692 s <<< FAILURE!
   java.lang.AssertionError: expected:<0> but was:<1>
           at org.junit.Assert.fail(Assert.java:89)
           at org.junit.Assert.failNotEquals(Assert.java:835)
           at org.junit.Assert.assertEquals(Assert.java:120)
           at org.junit.Assert.assertEquals(Assert.java:146)
           at 
org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testShardSplitPhaseTwo(KinesisSupervisorTest.java:4476)
           at 
org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testShardSplit(KinesisSupervisorTest.java:4182)
           at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
           at java.base/java.lang.reflect.Method.invoke(Method.java:580)
           at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
           at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
           at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
           at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
           at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
           at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
           at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
           at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
           at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
           at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
           at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
           at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
           at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
           at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
           at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
           at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
           at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
           at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
           at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
           at org.junit.runners.Suite.runChild(Suite.java:128)
           at org.junit.runners.Suite.runChild(Suite.java:27)
           at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
           at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
           at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
           at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
           at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
           at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
           at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
           at 
org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:49)
           at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:120)
           at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:95)
           at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
           at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:69)
           at 
org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:146)
           at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:385)
           at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
           at 
org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:507)
           at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:495)
   ```
   **Problem**: The `testShardSplit` test calls `testShardSplitPhaseTwo` and 
`testShardSplitPhaseThree`. Within these two function calls, there are 
assertions that check that the first element in the captured `postSplitTasks` 
list always has `TaskGroupId=0` and the second element always has 
`TaskGroupId=1`. However, the actual ordering depends on the ordering of 
`activelyReadingTaskGroups` ConcurrentHashMap in `SeekableStreamSupervisor`. 
More specifically, it depends on 
[this](https://github.com/apache/druid/blob/a4d0433e0bdf546cdcbbca1c54943ee6a5f7b3ef/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java#L3915)
 line in the `createNewTasks` where the `Task` gets created. Since the ordering 
of `.entrySet()` is not guaranteed, the order of the captured `postSplitTasks` 
list is also not guaranteed.
   
   I am able to consistently reproduce the test failure in 
`org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testShardSplit`,
 by modifying line 
[3915](https://github.com/apache/druid/blob/a4d0433e0bdf546cdcbbca1c54943ee6a5f7b3ef/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java#L3915)
 in `createNewTasks` as follows:
   ```java
   -    for (Entry<Integer, TaskGroup> entry : 
activelyReadingTaskGroups.entrySet()) {
   +    List<Entry<Integer, TaskGroup>> sortedEntries = new 
ArrayList<>(activelyReadingTaskGroups.entrySet());
   +    sortedEntries.sort((e1, e2) -> Integer.compare(e2.getKey(), 
e1.getKey())); // sort entries in descending order
   +    for (Entry<Integer, TaskGroup> entry : sortedEntries) {
   ```
   
   Compiling and running `mvn -pl extensions-core/kinesis-indexing-service test 
-Dtest=org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest#testShardSplit`
 will result in the same error as the one shown above.
   
   
   ### For 
`org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testMultiTask`
   ```
   [ERROR] Tests run: 4, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 
2.717 s <<< FAILURE! -- in 
org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest
   [ERROR] 
org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testMultiTask
 -- Time elapsed: 0.982 s <<< FAILURE!
   java.lang.AssertionError: expected:<0> but was:<null>
           at org.junit.Assert.fail(Assert.java:89)
           at org.junit.Assert.failNotEquals(Assert.java:835)
           at org.junit.Assert.assertEquals(Assert.java:120)
           at org.junit.Assert.assertEquals(Assert.java:146)
           at 
org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testMultiTask(KinesisSupervisorTest.java:609)
           at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
           at java.base/java.lang.reflect.Method.invoke(Method.java:580)
           at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
           at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
           at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
           at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
           at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
           at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
           at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
           at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
           at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
           at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
           at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
           at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
           at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
           at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
           at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
           at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
           at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
           at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
           at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
           at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:316)
           at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:240)
           at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:214)
           at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:155)
           at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:385)
           at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
           at 
org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:507)
           at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:495)
   ```
   
   **Problem**: Similar to the previous test, `testMultiTask` has assertions 
that assume some ordering for the captured `KinesisIndexTask`. Again, the 
actual ordering depends on the ordering of `activelyReadingTaskGroups` 
ConcurrentHashMap in `SeekableStreamSupervisor`. 
   
   I am also able to consistently reproduce the test failure in 
`org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorTest.testMultiTask`,
 using the same changes to line 
[3915](https://github.com/apache/druid/blob/a4d0433e0bdf546cdcbbca1c54943ee6a5f7b3ef/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java#L3915)
 mentioned above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to