[jira] [Commented] (IGNITE-8699) ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely)
[ https://issues.apache.org/jira/browse/IGNITE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519561#comment-16519561 ] ASF GitHub Bot commented on IGNITE-8699: Github user asfgit closed the pull request at: https://github.com/apache/ignite/pull/4161 > ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely) > -- > > Key: IGNITE-8699 > URL: https://issues.apache.org/jira/browse/IGNITE-8699 > Project: Ignite > Issue Type: Bug >Reporter: Vitaliy Biryukov >Assignee: Vitaliy Biryukov >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.7 > > Attachments: thread-dump-fail-before-local-join > > > *Affected tests:* > testDisconnectOnServersLeft_1 > testDisconnectOnServersLeft_2 > testDisconnectOnServersLeft_3 > testDisconnectOnServersLeft_4 > testDisconnectOnServersLeft_5 > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.TestCase.fail(TestCase.java:227) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4685) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.disconnectOnServersLeft(ZookeeperDiscoverySpiTest.java:3541) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testDisconnectOnServersLeft_4(ZookeeperDiscoverySpiTest.java:3476) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at junit.framework.TestCase.runTest(TestCase.java:176) > at > org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2086) > at > org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:140) > at > org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2001) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8699) ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely)
[ https://issues.apache.org/jira/browse/IGNITE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519560#comment-16519560 ] Dmitriy Pavlov commented on IGNITE-8699: [~VitaliyB], [~sergey-chugunov], I've added several changes related to code style and merged change. Changes were Idea inspections proposals, such as naming of ignored exception 'e' to 'ignored', space line between semantic blocks. > ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely) > -- > > Key: IGNITE-8699 > URL: https://issues.apache.org/jira/browse/IGNITE-8699 > Project: Ignite > Issue Type: Bug >Reporter: Vitaliy Biryukov >Assignee: Vitaliy Biryukov >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.7 > > Attachments: thread-dump-fail-before-local-join > > > *Affected tests:* > testDisconnectOnServersLeft_1 > testDisconnectOnServersLeft_2 > testDisconnectOnServersLeft_3 > testDisconnectOnServersLeft_4 > testDisconnectOnServersLeft_5 > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.TestCase.fail(TestCase.java:227) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4685) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.disconnectOnServersLeft(ZookeeperDiscoverySpiTest.java:3541) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testDisconnectOnServersLeft_4(ZookeeperDiscoverySpiTest.java:3476) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at junit.framework.TestCase.runTest(TestCase.java:176) > at > org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2086) > at > org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:140) > at > org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2001) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8699) ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely)
[ https://issues.apache.org/jira/browse/IGNITE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519330#comment-16519330 ] Sergey Chugunov commented on IGNITE-8699: - [~VitaliyB], I triggered ZooKeeper (Discovery) 2 suite, results look good to me: [TC link|https://ci.ignite.apache.org/viewLog.html?buildId=1374005&buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1&tab=buildResultsDiv] We can proceed with merging. > ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely) > -- > > Key: IGNITE-8699 > URL: https://issues.apache.org/jira/browse/IGNITE-8699 > Project: Ignite > Issue Type: Bug >Reporter: Vitaliy Biryukov >Assignee: Vitaliy Biryukov >Priority: Major > Labels: MakeTeamcityGreenAgain > Attachments: thread-dump-fail-before-local-join > > > *Affected tests:* > testDisconnectOnServersLeft_1 > testDisconnectOnServersLeft_2 > testDisconnectOnServersLeft_3 > testDisconnectOnServersLeft_4 > testDisconnectOnServersLeft_5 > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.TestCase.fail(TestCase.java:227) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4685) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.disconnectOnServersLeft(ZookeeperDiscoverySpiTest.java:3541) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testDisconnectOnServersLeft_4(ZookeeperDiscoverySpiTest.java:3476) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at junit.framework.TestCase.runTest(TestCase.java:176) > at > org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2086) > at > org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:140) > at > org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2001) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8699) ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely)
[ https://issues.apache.org/jira/browse/IGNITE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518078#comment-16518078 ] Vitaliy Biryukov commented on IGNITE-8699: -- [~sergey-chugunov], Is full *ZooKeeper (Discovery) 1* suit enough? [TC link|https://ci.ignite.apache.org/viewLog.html?buildId=1374005&buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1&tab=buildResultsDiv] You are right about option#1. This case reproduce on my Linux machine sometimes. The piece of thread dump (full thread dump in attachments): {noformat} Thread [name="disco-event-worker-#2605%internal.ZookeeperDiscoverySpiTest5%", id=3211, state=WAITING, blockCnt=2, waitCnt=6] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) at o.a.i.i.managers.discovery.GridDiscoveryManager.localJoin(GridDiscoveryManager.java:2190) at o.a.i.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest$2.apply(ZookeeperDiscoverySpiTest.java:315) - locked java.util.TreeMap@38081448 at o.a.i.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest$2.apply(ZookeeperDiscoverySpiTest.java:295) at o.a.i.i.managers.eventstorage.GridEventStorageManager$UserListenerWrapper.onEvent(GridEventStorageManager.java:1477) at o.a.i.i.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:873) at o.a.i.i.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:858) at o.a.i.i.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:341) at o.a.i.i.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:307) at o.a.i.i.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2703) at o.a.i.i.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2920) at o.a.i.i.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2732) at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) {noformat} > ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely) > -- > > Key: IGNITE-8699 > URL: https://issues.apache.org/jira/browse/IGNITE-8699 > Project: Ignite > Issue Type: Bug >Reporter: Vitaliy Biryukov >Assignee: Vitaliy Biryukov >Priority: Major > Labels: MakeTeamcityGreenAgain > Attachments: thread-dump-fail-before-local-join > > > *Affected tests:* > testDisconnectOnServersLeft_1 > testDisconnectOnServersLeft_2 > testDisconnectOnServersLeft_3 > testDisconnectOnServersLeft_4 > testDisconnectOnServersLeft_5 > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.TestCase.fail(TestCase.java:227) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4685) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.disconnectOnServersLeft(ZookeeperDiscoverySpiTest.java:3541) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testDisconnectOnServersLeft_4(ZookeeperDiscoverySpiTest.java:3476) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at junit.framework.TestCase.runTest(TestCase.java:176) > at > org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2086) > at > org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:140) > at > org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2001) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8699) ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely)
[ https://issues.apache.org/jira/browse/IGNITE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518042#comment-16518042 ] Sergey Chugunov commented on IGNITE-8699: - [~VitaliyB], Change looks reasonable for me as well, but I think we should run all Zookeeper-related tests on this change (now we have TC only for 4 isolated tests). I'm also curious about option#1 for test to fail. Could you share stack trace showing how *DiscoveryWorker* hangs? Do I understand you correctly that it happens when client node didn't finish local join procedure before the very last server died, and client entered some undefined state? > ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely) > -- > > Key: IGNITE-8699 > URL: https://issues.apache.org/jira/browse/IGNITE-8699 > Project: Ignite > Issue Type: Bug >Reporter: Vitaliy Biryukov >Assignee: Vitaliy Biryukov >Priority: Major > Labels: MakeTeamcityGreenAgain > > *Affected tests:* > testDisconnectOnServersLeft_1 > testDisconnectOnServersLeft_2 > testDisconnectOnServersLeft_3 > testDisconnectOnServersLeft_4 > testDisconnectOnServersLeft_5 > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.TestCase.fail(TestCase.java:227) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4685) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.disconnectOnServersLeft(ZookeeperDiscoverySpiTest.java:3541) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testDisconnectOnServersLeft_4(ZookeeperDiscoverySpiTest.java:3476) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at junit.framework.TestCase.runTest(TestCase.java:176) > at > org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2086) > at > org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:140) > at > org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2001) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8699) ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely)
[ https://issues.apache.org/jira/browse/IGNITE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515545#comment-16515545 ] Pavel Pereslegin commented on IGNITE-8699: -- [~VitaliyB], looks good for me. > ZookeeperDiscoverySpiTest#testDisconnectOnServersLeft flaky fails (rarely) > -- > > Key: IGNITE-8699 > URL: https://issues.apache.org/jira/browse/IGNITE-8699 > Project: Ignite > Issue Type: Bug >Reporter: Vitaliy Biryukov >Assignee: Vitaliy Biryukov >Priority: Major > Labels: MakeTeamcityGreenAgain > > *Affected tests:* > testDisconnectOnServersLeft_1 > testDisconnectOnServersLeft_2 > testDisconnectOnServersLeft_3 > testDisconnectOnServersLeft_4 > testDisconnectOnServersLeft_5 > *Causes:* > * Sometimes client nodes don't have time to join the topology. > * Sometimes starts communication failure resolver and wait for server nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)