[ https://issues.apache.org/jira/browse/IGNITE-14671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ilya Shishkov updated IGNITE-14671: ----------------------------------- Fix Version/s: 2.11 > Test IgniteClusterSnapshotCheckTest#testClusterSnapshotCheckOtherCluster is > flaky > --------------------------------------------------------------------------------- > > Key: IGNITE-14671 > URL: https://issues.apache.org/jira/browse/IGNITE-14671 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.11 > Reporter: Ilya Shishkov > Assignee: Ilya Shishkov > Priority: Trivial > Labels: iep-43 > Fix For: 2.11 > > Attachments: testClusterSnapshotCheckOtherCluster_fix.patch, > testClusterSnapshotCheckOtherCluster_printCount.patch > > Time Spent: 10m > Remaining Estimate: 0h > > To reproduce failure, run it several times, for example, set up IDE to run > test with 'repeat until failure' option. Then you will get an assertion error: > {code:java} > java.lang.AssertionError: Number of jobs must be equal to the cluster size > (except local node): [a2844419-3081-432a-b611-c4f891900005] > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29) > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316) > {code} > With applied patch [1] exception would be as follows: > {code:java} > java.lang.AssertionError: Number of jobs must be equal to the cluster size > (except local node): [e7346d3b-b257-466c-95c2-0a85a7600005], count: 1 > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29) > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316) > {code} > It seems to be a concurrent update problem of thread unsafe HasSet (see [2, > 3]): > {code:java|title=Unsafe HashSet} > Set<UUID> assigns = new HashSet<>(); > {code} > {code:java|title=Concurrent update} > grid(i).context().io().addMessageListener(GridTopic.TOPIC_JOB, new > GridMessageListener() { > @Override public void onMessage(UUID nodeId, Object msg, byte plc) { > if (msg instanceof GridJobExecuteRequest) { > GridJobExecuteRequest msg0 = (GridJobExecuteRequest)msg; > if > (msg0.getTaskName().contains(SnapshotPartitionsVerifyTask.class.getName())) > assigns.add(locNodeId); > } > } > }); > {code} > With concurrent Set implementation problem is not reproducing (see patch [4]): > {code:java} > Set<UUID> assigns = Collections.newSetFromMap(new ConcurrentHashMap<>()); > {code} > # [^testClusterSnapshotCheckOtherCluster_printCount.patch] > # > [IgniteClusterSnapshotCheckTest.java#L287|https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotCheckTest.java#L287] > # > [IgniteClusterSnapshotCheckTest.java#L300|https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotCheckTest.java#L300] > # [^testClusterSnapshotCheckOtherCluster_fix.patch] -- This message was sent by Atlassian Jira (v8.3.4#803005)