Yakov, thank you for the advice. The thread.sleep is not enough, but some latch + future give me a way to the reproducer.
I have created PR [1] into my master, for showing a test and modification of ServerImpl which help me to slow down execution inside a danger section. A code of test a bit long, but basically it about two parts: In the first part, I randomly start and stop nodes to get a moment when a server is starting to execute the dangerous code which I described in the first message. In the second part, I'm waiting while the first part produces this situation and after that, I call public method of ServerImpl which fails with an exception: java.lang.AssertionError: Invalid node order: TcpDiscoveryNode [id=f6bf048d-378b-4960-94cb-84e3d3300002, addrs=[127.0.0.1], sockAddrs=[/ 127.0.0.1:47502], discPort=47502, order=0, intOrder=2, lastExchangeTime=1524836605995, loc=false, ver=2.5.0#20180426-sha1:34e22396, isClient=false] at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:52) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:49) at org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2014) at org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9679) at org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9652) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:590) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleRemoteNodes(TcpDiscoveryNodesRing.java:164) at org.apache.ignite.spi.discovery.tcp.ServerImpl.getRemoteNodes(ServerImpl.java:304) As I told in the first message the problem arises because of the current code changes local node internal order and breaks sorting in TcpDiscoveryNodesRing.nodes collection. Is this reproducer convince enough? [1] Reproducer: https://github.com/SharplEr/ignite/pull/10/files 2018-02-13 20:17 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>: > Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx) > anywhere you like to show the incorrect behavior you describe. > > --Yakov >