[jira] [Commented] (FLINK-29611) Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest
[ https://issues.apache.org/jira/browse/FLINK-29611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626538#comment-17626538 ] Chesnay Schepler commented on FLINK-29611: -- Sorry but we are busy enough without chasing theoretical test instabilities. Closing this issue and PR. > Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest > -- > > Key: FLINK-29611 > URL: https://issues.apache.org/jira/browse/FLINK-29611 > Project: Flink > Issue Type: Bug >Reporter: Sopan Phaltankar >Priority: Minor > Labels: pull-request-available > > The test > _org.apache.flink.streaming.api.operators.co.CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport_ > has the following failure: > Failures: > [ERROR] CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport:74 > Wrong Side Output: arrays first differed at element [0]; expected: 15 : 9:key.6->6> but was:5> > I used the tool [NonDex|https://github.com/TestingResearchIllinois/NonDex] to > find this flaky test. > Command: mvn edu.illinois:nondex-maven-plugun:1.1.2:nondex -Dtest='Fully > Qualified Test Name' > I analyzed the assertion failure and found that the root cause is because the > test method calls ctx.getBroadcastState(STATE_DESCRIPTOR).immutableEntries() > which calls the entrySet() method of the underlying HashMap. entrySet() > returns the entries in a non-deterministic way, causing the test to be flaky. > The fix would be to change _HashMap_ to _LinkedHashMap_ where the Map is > getting initialized. > On further analysis, it was found that the Map is getting initialized on line > 53 of org.apache.flink.runtime.state.HeapBroadcastState class. > After changing from HashMap to LinkedHashMap, the above test is passing. > Edit: Upon making this change and running the CI, it was found that the tests > org.apache.flink.api.datastream.DataStreamBatchExecutionITCase.batchKeyedBroadcastExecution > and > org.apache.flink.api.datastream.DataStreamBatchExecutionITCase.batchBroadcastExecution > were failing. Upon further investigation, I found that these tests were also > flaky and depended on the earlier made change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29611) Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest
[ https://issues.apache.org/jira/browse/FLINK-29611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17624850#comment-17624850 ] Sopan Phaltankar commented on FLINK-29611: -- [~martijnvisser] Here is my PR: [https://github.com/apache/flink/pull/21151] Let me know your feedback > Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest > -- > > Key: FLINK-29611 > URL: https://issues.apache.org/jira/browse/FLINK-29611 > Project: Flink > Issue Type: Bug >Reporter: Sopan Phaltankar >Priority: Minor > Labels: pull-request-available > > The test > _org.apache.flink.streaming.api.operators.co.CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport_ > has the following failure: > Failures: > [ERROR] CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport:74 > Wrong Side Output: arrays first differed at element [0]; expected: 15 : 9:key.6->6> but was:5> > I used the tool [NonDex|https://github.com/TestingResearchIllinois/NonDex] to > find this flaky test. > Command: mvn edu.illinois:nondex-maven-plugun:1.1.2:nondex -Dtest='Fully > Qualified Test Name' > I analyzed the assertion failure and found that the root cause is because the > test method calls ctx.getBroadcastState(STATE_DESCRIPTOR).immutableEntries() > which calls the entrySet() method of the underlying HashMap. entrySet() > returns the entries in a non-deterministic way, causing the test to be flaky. > The fix would be to change _HashMap_ to _LinkedHashMap_ where the Map is > getting initialized. > On further analysis, it was found that the Map is getting initialized on line > 53 of org.apache.flink.runtime.state.HeapBroadcastState class. > After changing from HashMap to LinkedHashMap, the above test is passing. > Edit: Upon making this change and running the CI, it was found that the tests > org.apache.flink.api.datastream.DataStreamBatchExecutionITCase.batchKeyedBroadcastExecution > and > org.apache.flink.api.datastream.DataStreamBatchExecutionITCase.batchBroadcastExecution > were failing. Upon further investigation, I found that these tests were also > flaky and depended on the earlier made change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29611) Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest
[ https://issues.apache.org/jira/browse/FLINK-29611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620126#comment-17620126 ] Martijn Visser commented on FLINK-29611: [~chesnay] WDYT? > Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest > -- > > Key: FLINK-29611 > URL: https://issues.apache.org/jira/browse/FLINK-29611 > Project: Flink > Issue Type: Bug >Reporter: Sopan Phaltankar >Priority: Minor > > The test > _org.apache.flink.streaming.api.operators.co.CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport_ > has the following failure: > Failures: > [ERROR] CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport:74 > Wrong Side Output: arrays first differed at element [0]; expected: 15 : 9:key.6->6> but was:5> > I used the tool [NonDex|https://github.com/TestingResearchIllinois/NonDex] to > find this flaky test. > Command: mvn edu.illinois:nondex-maven-plugun:1.1.2:nondex -Dtest='Fully > Qualified Test Name' > I analyzed the assertion failure and found that the root cause is because the > test method calls ctx.getBroadcastState(STATE_DESCRIPTOR).immutableEntries() > which calls the entrySet() method of the underlying HashMap. entrySet() > returns the entries in a non-deterministic way, causing the test to be flaky. > The fix would be to change _HashMap_ to _LinkedHashMap_ where the Map is > getting initialized. > On further analysis, it was found that the Map is getting initialized on line > 53 of org.apache.flink.runtime.state.HeapBroadcastState class. > After changing from HashMap to LinkedHashMap, the above test is passing. > Edit: Upon making this change and running the CI, it was found that the tests > org.apache.flink.api.datastream.DataStreamBatchExecutionITCase.batchKeyedBroadcastExecution > and > org.apache.flink.api.datastream.DataStreamBatchExecutionITCase.batchBroadcastExecution > were failing. Upon further investigation, I found that these tests were also > flaky and depended on the earlier made change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29611) Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest
[ https://issues.apache.org/jira/browse/FLINK-29611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17619810#comment-17619810 ] Sopan Phaltankar commented on FLINK-29611: -- [~martijnvisser] This test depends on the order of iteration in `HashMap.entrySet()` and can fail for some orders. I am running this on my local machine. I have used a maven plugin NonDex, which can be used to identify such tests. One can reproduce it with the command `mvn edu.illinois:nondex-maven-plugun:1.1.2:nondex -Dtest=org.apache.flink.streaming.api.operators.co.CoBroadcastWithNonKeyedOperatorTest#testMultiStateSupport`. Even if the test was not failing during the daily jobs, it'd be good to not depend on the `HashMap.entrySet()` that gives back results in an undefined manner [JavaDoc|[https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html#entrySet--]] for this. Therefore, to remove non-determinism completely, we can do this change. > Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest > -- > > Key: FLINK-29611 > URL: https://issues.apache.org/jira/browse/FLINK-29611 > Project: Flink > Issue Type: Bug >Reporter: Sopan Phaltankar >Priority: Minor > > The test > _org.apache.flink.streaming.api.operators.co.CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport_ > has the following failure: > Failures: > [ERROR] CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport:74 > Wrong Side Output: arrays first differed at element [0]; expected: 15 : 9:key.6->6> but was:5> > I used the tool [NonDex|https://github.com/TestingResearchIllinois/NonDex] to > find this flaky test. > I analyzed the assertion failure and found that the root cause is because the > test method calls ctx.getBroadcastState(STATE_DESCRIPTOR).immutableEntries() > which calls the entrySet() method of the underlying HashMap. entrySet() > returns the entries in a non-deterministic way, causing the test to be flaky. > The fix would be to change _HashMap_ to _LinkedHashMap_ where the Map is > getting initialized. > On further analysis, it was found that the Map is getting initialized on line > 53 of org.apache.flink.runtime.state.HeapBroadcastState class. > After changing from HashMap to LinkedHashMap, the above test is passing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29611) Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest
[ https://issues.apache.org/jira/browse/FLINK-29611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616866#comment-17616866 ] Martijn Visser commented on FLINK-29611: [~sopan98] Are you encountering this on your local machine? This test is being run during every PR, every merged commit and also during the nightly build jobs, but it has never been flaky (else it would have been registered in Jira before). > Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest > -- > > Key: FLINK-29611 > URL: https://issues.apache.org/jira/browse/FLINK-29611 > Project: Flink > Issue Type: Bug >Reporter: Sopan Phaltankar >Priority: Minor > > The test > _org.apache.flink.streaming.api.operators.co.CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport_ > has the following failure: > Failures: > [ERROR] CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport:74 > Wrong Side Output: arrays first differed at element [0]; expected: 15 : 9:key.6->6> but was:5> > I used the tool [NonDex|https://github.com/TestingResearchIllinois/NonDex] to > find this flaky test. > I analyzed the assertion failure and found that the root cause is because the > test method calls ctx.getBroadcastState(STATE_DESCRIPTOR).immutableEntries() > which calls the entrySet() method of the underlying HashMap. entrySet() > returns the entries in a non-deterministic way, causing the test to be flaky. > The fix would be to change _HashMap_ to _LinkedHashMap_ where the Map is > getting initialized. > On further analysis, it was found that the Map is getting initialized on line > 53 of org.apache.flink.runtime.state.HeapBroadcastState class. > After changing from HashMap to LinkedHashMap, the above test is passing. -- This message was sent by Atlassian Jira (v8.20.10#820010)