[ 
https://issues.apache.org/jira/browse/IGNITE-14671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Shishkov updated IGNITE-14671:
-----------------------------------
    Description: 
To reproduce failure, run it several times, for example, set up IDE to run test 
with 'repeat until failure' option. Then you will get an assertion error:
{code:java}
java.lang.AssertionError: Number of jobs must be equal to the cluster size 
(except local node): [a2844419-3081-432a-b611-c4f891900005]

        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at 
org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
        at 
org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
{code}
With applied patch [1] exception would be as follows:
{code:java}
java.lang.AssertionError: Number of jobs must be equal to the cluster size 
(except local node): [e7346d3b-b257-466c-95c2-0a85a7600005], count: 1

        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at 
org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
        at 
org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
{code}
It seems to be a concurrent update problem of thread unsafe HasSet (see [2, 3]):
{code:java|title=Unsafe HashSet}
Set<UUID> assigns = new HashSet<>();
{code}
{code:java|title=Concurrent update}
grid(i).context().io().addMessageListener(GridTopic.TOPIC_JOB, new 
GridMessageListener() {
    @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
        if (msg instanceof GridJobExecuteRequest) {
            GridJobExecuteRequest msg0 = (GridJobExecuteRequest)msg;

            if 
(msg0.getTaskName().contains(SnapshotPartitionsVerifyTask.class.getName()))
                assigns.add(locNodeId);
        }
    }
});
{code}
With concurrent Set implementation problem is not reproducing (see patch [4]):
{code:java}
Set<UUID> assigns = Collections.newSetFromMap(new ConcurrentHashMap<>());
{code}

#  [^testClusterSnapshotCheckOtherCluster_printCount.patch] 
# 
[IgniteClusterSnapshotCheckTest.java#L287|https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotCheckTest.java#L287]
# 
[IgniteClusterSnapshotCheckTest.java#L300|https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotCheckTest.java#L300]
#  [^testClusterSnapshotCheckOtherCluster_fix.patch] 

  was:
To reproduce failure, run it several times, for example, set up IDE to run test 
with 'repeat until failure' option. Then you will get an assertion error:
{code:java}
java.lang.AssertionError: Number of jobs must be equal to the cluster size 
(except local node): [a2844419-3081-432a-b611-c4f891900005]

        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at 
org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
        at 
org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
{code}
With applied patch [1] exception would be as follows:
{code:java}
java.lang.AssertionError: Number of jobs must be equal to the cluster size 
(except local node): [e7346d3b-b257-466c-95c2-0a85a7600005], count: 1

        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at 
org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
        at 
org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
{code}
It seems to be a concurrent update problem of thread unsafe HasSet (see [2, 3]):
{code:java|title=Unsafe HashSet}
Set<UUID> assigns = new HashSet<>();
{code}
{code:java|title=Concurrent update}
grid(i).context().io().addMessageListener(GridTopic.TOPIC_JOB, new 
GridMessageListener() {
    @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
        if (msg instanceof GridJobExecuteRequest) {
            GridJobExecuteRequest msg0 = (GridJobExecuteRequest)msg;

            if 
(msg0.getTaskName().contains(SnapshotPartitionsVerifyTask.class.getName()))
                assigns.add(locNodeId);
        }
    }
});
{code}
With concurrent Set implementation problem is not reproducing (see patch [4]):
{code:java}
Set<UUID> assigns = Collections.newSetFromMap(new ConcurrentHashMap<>());
{code}

#  [^testClusterSnapshotCheckOtherCluster_printCount.patch] 
# 
https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotCheckTest.java#L287
# 
https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotCheckTest.java#L300
#  [^testClusterSnapshotCheckOtherCluster_fix.patch] 


> Test IgniteClusterSnapshotCheckTest#testClusterSnapshotCheckOtherCluster is 
> flaky
> ---------------------------------------------------------------------------------
>
>                 Key: IGNITE-14671
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14671
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.11
>            Reporter: Ilya Shishkov
>            Assignee: Ilya Shishkov
>            Priority: Trivial
>              Labels: iep-43
>         Attachments: testClusterSnapshotCheckOtherCluster_fix.patch, 
> testClusterSnapshotCheckOtherCluster_printCount.patch
>
>
> To reproduce failure, run it several times, for example, set up IDE to run 
> test with 'repeat until failure' option. Then you will get an assertion error:
> {code:java}
> java.lang.AssertionError: Number of jobs must be equal to the cluster size 
> (except local node): [a2844419-3081-432a-b611-c4f891900005]
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at 
> org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
> {code}
> With applied patch [1] exception would be as follows:
> {code:java}
> java.lang.AssertionError: Number of jobs must be equal to the cluster size 
> (except local node): [e7346d3b-b257-466c-95c2-0a85a7600005], count: 1
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at 
> org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
> {code}
> It seems to be a concurrent update problem of thread unsafe HasSet (see [2, 
> 3]):
> {code:java|title=Unsafe HashSet}
> Set<UUID> assigns = new HashSet<>();
> {code}
> {code:java|title=Concurrent update}
> grid(i).context().io().addMessageListener(GridTopic.TOPIC_JOB, new 
> GridMessageListener() {
>     @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
>         if (msg instanceof GridJobExecuteRequest) {
>             GridJobExecuteRequest msg0 = (GridJobExecuteRequest)msg;
>             if 
> (msg0.getTaskName().contains(SnapshotPartitionsVerifyTask.class.getName()))
>                 assigns.add(locNodeId);
>         }
>     }
> });
> {code}
> With concurrent Set implementation problem is not reproducing (see patch [4]):
> {code:java}
> Set<UUID> assigns = Collections.newSetFromMap(new ConcurrentHashMap<>());
> {code}
> #  [^testClusterSnapshotCheckOtherCluster_printCount.patch] 
> # 
> [IgniteClusterSnapshotCheckTest.java#L287|https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotCheckTest.java#L287]
> # 
> [IgniteClusterSnapshotCheckTest.java#L300|https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotCheckTest.java#L300]
> #  [^testClusterSnapshotCheckOtherCluster_fix.patch] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to