Re: Looks like a bug in ServerImpl.joinTopology()
Alex, I checked reproducer you presented, please fix it according to the following rules 1) Never use System.out.println() as a part of reproducer, use assetrions if necessary. 2) Reproducer should be small. As small as possible. 3) Try to make reproducer clear. As crear as possible. In case following code can be simplified - it should be. if (flag && latch.getCount() > 0) { fut.onDone(this); 3.1) Newer use variable's names like 'flag', this makes reviewers nervous. пт, 27 апр. 2018 г. в 17:40, Александр Меньшиков : > Yakov, thank you for the advice. > > The thread.sleep is not enough, but some latch + future give me a way to > the > reproducer. > > I have created PR [1] into my master, for showing a test and modification > of > ServerImpl which help me to slow down execution inside a danger section. > > A code of test a bit long, but basically it about two parts: > > In the first part, I randomly start and stop nodes to get a moment when > a server is starting to execute the dangerous code which I described in the > first message. > > In the second part, I'm waiting while the first part produces this > situation > and after that, I call public method of ServerImpl which fails with an > exception: > > java.lang.AssertionError: Invalid node order: TcpDiscoveryNode > [id=f6bf048d-378b-4960-94cb-84e3d332, addrs=[127.0.0.1], sockAddrs=[/ > 127.0.0.1:47502], discPort=47502, order=0, intOrder=2, > lastExchangeTime=1524836605995, loc=false, > ver=2.5.0#20180426-sha1:34e22396, isClient=false] > at > > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:52) > at > > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:49) > at > org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2014) > at > > org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9679) > at > > org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9652) > at > > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:590) > at > > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleRemoteNodes(TcpDiscoveryNodesRing.java:164) > at > > org.apache.ignite.spi.discovery.tcp.ServerImpl.getRemoteNodes(ServerImpl.java:304) > > As I told in the first message the problem arises because of the current > code > changes local node internal order and breaks sorting in > TcpDiscoveryNodesRing.nodes collection. > > Is this reproducer convince enough? > > [1] Reproducer: https://github.com/SharplEr/ignite/pull/10/files > > > > 2018-02-13 20:17 GMT+03:00 Yakov Zhdanov : > > > Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx) > > anywhere you like to show the incorrect behavior you describe. > > > > --Yakov > > >
Re: Looks like a bug in ServerImpl.joinTopology()
+ Anton Vinogradov 2018-04-27 17:40 GMT+03:00 Александр Меньшиков : > Yakov, thank you for the advice. > > The thread.sleep is not enough, but some latch + future give me a way to > the > reproducer. > > I have created PR [1] into my master, for showing a test and modification > of > ServerImpl which help me to slow down execution inside a danger section. > > A code of test a bit long, but basically it about two parts: > > In the first part, I randomly start and stop nodes to get a moment when > a server is starting to execute the dangerous code which I described in the > first message. > > In the second part, I'm waiting while the first part produces this > situation > and after that, I call public method of ServerImpl which fails with an > exception: > > java.lang.AssertionError: Invalid node order: TcpDiscoveryNode > [id=f6bf048d-378b-4960-94cb-84e3d332, addrs=[127.0.0.1], sockAddrs=[/ > 127.0.0.1:47502], discPort=47502, order=0, intOrder=2, > lastExchangeTime=1524836605995, > loc=false, ver=2.5.0#20180426-sha1:34e22396, isClient=false] > at org.apache.ignite.spi.discovery.tcp.internal. > TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:52) > at org.apache.ignite.spi.discovery.tcp.internal. > TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:49) > at org.apache.ignite.internal.util.lang.GridFunc.isAll( > GridFunc.java:2014) > at org.apache.ignite.internal.util.IgniteUtils.arrayList( > IgniteUtils.java:9679) > at org.apache.ignite.internal.util.IgniteUtils.arrayList( > IgniteUtils.java:9652) > at org.apache.ignite.spi.discovery.tcp.internal. > TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:590) > at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing. > visibleRemoteNodes(TcpDiscoveryNodesRing.java:164) > at org.apache.ignite.spi.discovery.tcp.ServerImpl. > getRemoteNodes(ServerImpl.java:304) > > As I told in the first message the problem arises because of the current > code > changes local node internal order and breaks sorting in > TcpDiscoveryNodesRing.nodes collection. > > Is this reproducer convince enough? > > [1] Reproducer: https://github.com/SharplEr/ignite/pull/10/files > > > > > 2018-02-13 20:17 GMT+03:00 Yakov Zhdanov : > >> Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx) >> anywhere you like to show the incorrect behavior you describe. >> >> --Yakov >> > >
Re: Looks like a bug in ServerImpl.joinTopology()
Yakov, thank you for the advice. The thread.sleep is not enough, but some latch + future give me a way to the reproducer. I have created PR [1] into my master, for showing a test and modification of ServerImpl which help me to slow down execution inside a danger section. A code of test a bit long, but basically it about two parts: In the first part, I randomly start and stop nodes to get a moment when a server is starting to execute the dangerous code which I described in the first message. In the second part, I'm waiting while the first part produces this situation and after that, I call public method of ServerImpl which fails with an exception: java.lang.AssertionError: Invalid node order: TcpDiscoveryNode [id=f6bf048d-378b-4960-94cb-84e3d332, addrs=[127.0.0.1], sockAddrs=[/ 127.0.0.1:47502], discPort=47502, order=0, intOrder=2, lastExchangeTime=1524836605995, loc=false, ver=2.5.0#20180426-sha1:34e22396, isClient=false] at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:52) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:49) at org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2014) at org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9679) at org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9652) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:590) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleRemoteNodes(TcpDiscoveryNodesRing.java:164) at org.apache.ignite.spi.discovery.tcp.ServerImpl.getRemoteNodes(ServerImpl.java:304) As I told in the first message the problem arises because of the current code changes local node internal order and breaks sorting in TcpDiscoveryNodesRing.nodes collection. Is this reproducer convince enough? [1] Reproducer: https://github.com/SharplEr/ignite/pull/10/files 2018-02-13 20:17 GMT+03:00 Yakov Zhdanov : > Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx) > anywhere you like to show the incorrect behavior you describe. > > --Yakov >
Re: Looks like a bug in ServerImpl.joinTopology()
Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx) anywhere you like to show the incorrect behavior you describe. --Yakov