[jira] [Comment Edited] (IGNITE-15996) Node fails with "Node with the same ID was found" while connecting to the cluster in Docker container if previous container was stopped

Ksenia Rybakova (Jira) Tue, 07 Dec 2021 06:10:07 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454638#comment-17454638
 ]


Ksenia Rybakova edited comment on IGNITE-15996 at 12/7/21, 2:09 PM:
--------------------------------------------------------------------

The issue is reproduced only when running containers with default network 
(without --network specified) and only on Windows systems. 3rd node can not 
join the cluster even if 2nd node was not stopped.
All required ports (47100, 47500) are open at all nodes. Nodes can reach each 
other. Netstat shows that TCP connection to TcpDiscoverySpi port is 
established, but then for some reason it's reset. 
TCP traffic analysis didn't make the reason of such behavior more clear.
As a workaround a user-defined network should be created before running 
containers:
{noformat}
docker network create my-net{noformat}
and then run ignite containers at this network
{noformat}
docker run -d --net my-net apacheignite/ignite{noformat}
Docker doc reference [here|https://docs.docker.com/network/bridge/]

As a resolution suggest adding corresponding info to [documentation 
|https://ignite.apache.org/docs/latest/installation/installing-using-docker](recommendation
 to use user-defined network when running containers on Windows).

 


was (Author: krybakova):
The issue is reproduced only when running containers with default network 
(without --network specified) and only on Windows systems. 3rd node can not 
join the cluster even if 2nd node was not stopped.
As a workaround an user-defined network should be created before running 
containers:
{noformat}
docker network create my-net{noformat}
and then run ignite containers at this network
{noformat}
docker run -d --net my-net apacheignite/ignite{noformat}
Docker doc reference [here|https://docs.docker.com/network/bridge/]

As a resolution suggest adding corresponding info to [documentation 
|https://ignite.apache.org/docs/latest/installation/installing-using-docker](recommendation
 to use user-defined network when running containers on Windows).

 

> Node fails with "Node with the same ID was found" while connecting to the 
> cluster in Docker container if previous container was stopped
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-15996
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15996
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.10
>         Environment: Windows 10, Docker+WSL2
>            Reporter: Ksenia Rybakova
>            Priority: Major
>         Attachments: ignite-47b5227b.0.log, ignite-c072978e.0.log, 
> ignite-c62bc58e.0.log
>
>
> Node in Docker container fails to connect to existing cluster if previously 
> connected node (container) was stopped:
> {noformat}
> [11:27:38,272][SEVERE][main][IgniteKernal] Got exception while starting (will 
> rollback startup routine).
> class org.apache.ignite.IgniteCheckedException: Failed to start manager: 
> GridManagerAdapter [enabled=true, 
> name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
>     at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1990)
>     at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1331)
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2141)
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1787)
>     at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1172)
>     at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1066)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:952)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:851)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:721)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
>     at org.apache.ignite.Ignition.start(Ignition.java:353)
>     at 
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:367)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start 
> SPI: TcpDiscoverySpi [addrRslvr=null, addressFilter=null, sockTimeout=5000, 
> ackTimeout=5000, marsh=JdkMarshaller 
> [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@21f9277b], 
> reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=0, 
> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, 
> skipAddrsRandomization=false]
>     at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:281)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:980)
>     at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1985)
>     ... 11 more
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
> ID was found in node IDs history or existing node in topology has the same ID 
> (fix configuration and restart local node) [localNode=TcpDiscoveryNode 
> [id=c62bc58e-102a-4928-8e54-ac8a56bf4d44, 
> consistentId=127.0.0.1,172.17.0.4:47500, addrs=ArrayList [127.0.0.1, 
> 172.17.0.4], sockAddrs=HashSet [402b337a50dd/172.17.0.4:47500, 
> /127.0.0.1:47500], discPort=47500, order=0, intOrder=3, 
> lastExchangeTime=1637839658247, loc=true, ver=2.11.0#20210911-sha1:8f3f07d3, 
> isClient=false], existingNode=c62bc58e-102a-4928-8e54-ac8a56bf4d44]
>     at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.duplicateIdError(TcpDiscoverySpi.java:2083)
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1201)
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:473)
>     at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2207)
>     at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278)
>     ... 13 more{noformat}
> Steps to reproduce:
> 1) Download ignite Docker image
> {code:java}
> docker pull apacheignite/ignite:2.11.0{code}
>  2) Start node 1 (local directory is mounted to save logs)
> {code:java}
> docker run -d -v ${PWD}/docker_ignite_w1:/opt/ignite/apache-ignite/work 
> apacheignite/ignite:2.11.0 
> c5219b095c93ec56731eec9fa871ffb722ddead987256198d76889f4a1a8ea3e{code}
> 3) Start node 2
> {code:java}
> docker run -d -v ${PWD}/docker_ignite_w2:/opt/ignite/apache-ignite/work 
> apacheignite/ignite:2.11.0 
> 65fdae68a40b2d3d17ab7e560320ef6757713d8efacbc25a26aecca03be6f975{code}
> 4) Stop container for node 2
> {code:java}
> docker stop 65fdae68a40b{code}
> 5) Start node 3
> {code:java}
> docker run -d -v ${PWD}/docker_ignite_w3:/opt/ignite/apache-ignite/work 
> apacheignite/ignite:2.11.0{code}
> Expected: node 3 joins the cluster successfully
> Actual: node 3 fails with "IgniteSpiException: Node with the same ID was 
> found in node IDs history or existing node in topology has the same ID." 
> while id seems unique. 
> Logs are attached:
> node 1 - ignite-47b5227b.0.log,
> node 2 - ignite-c072978e.0.log,
> node 3 - ignite-c62bc58e.0.log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (IGNITE-15996) Node fails with "Node with the same ID was found" while connecting to the cluster in Docker container if previous container was stopped

Reply via email to