[jira] [Comment Edited] (IGNITE-15996) Node fails with "Node with the same ID was found" while connecting to the cluster in Docker container if previous container was stopped

2021-12-10 Thread Guilherme Momesso (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457295#comment-17457295
 ] 

Guilherme Momesso edited comment on IGNITE-15996 at 12/10/21, 5:40 PM:
---

I'm facing the same issue while running on a Kubernetes cluster managed by 
Rancher.

Like [~krybakova] pointed, the error starts when the 3rd pod is launched. Then 
after a while all of the three pods keeps restarting with "Node with the same 
ID was found in node IDs history" error. After some time one pod stays running 
stable and the other two keep restarting.
I can only use 2 nodes again if I finish all the nodes and start again.

I'm using Apache Ignite 2.11.0 with TcpDiscoveryKubernetesIpFinder as IP 
finder. The nodes are AWS EC2 Linux instances. I've followed the 
Installation->Kubernetes steps of the documentation and the only difference I 
remember now is that I configure the K8s Service type as "ClusterIP" instead of 
"LoadBalancer".

Unfortunately, I can't use the pointed workaround.


was (Author: JIRAUSER281553):
I'm facing the same issue while running on a Kubernetes cluster managed by 
Rancher.

Like [~krybakova] pointed, the error starts when the 3rd pod is launched. Then 
after a while all of the three pods keeps restarting with "Node with the same 
ID was found in node IDs history" error. After some time one pod stays running 
stable and the other two keep restarting.
I can only use 2 nodes again if I finish all the nodes and start again.

I'm using TcpDiscoveryKubernetesIpFinder as IP finder. The nodes are AWS EC2 
Linux instances. I've followed the Installation->Kubernetes steps of the 
documentation and the only difference I remember now is that I configure the 
K8s Service type as "ClusterIP" instead of "LoadBalancer".

Unfortunately, I can't use the pointed workaround.

> Node fails with "Node with the same ID was found" while connecting to the 
> cluster in Docker container if previous container was stopped
> ---
>
> Key: IGNITE-15996
> URL: https://issues.apache.org/jira/browse/IGNITE-15996
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.10
> Environment: Windows 10, Docker+WSL2
>Reporter: Ksenia Rybakova
>Priority: Major
> Attachments: ignite-47b5227b.0.log, ignite-c072978e.0.log, 
> ignite-c62bc58e.0.log
>
>
> Node in Docker container fails to connect to existing cluster if previously 
> connected node (container) was stopped:
> {noformat}
> [11:27:38,272][SEVERE][main][IgniteKernal] Got exception while starting (will 
> rollback startup routine).
> class org.apache.ignite.IgniteCheckedException: Failed to start manager: 
> GridManagerAdapter [enabled=true, 
> name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
>     at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1990)
>     at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1331)
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2141)
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1787)
>     at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1172)
>     at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1066)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:952)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:851)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:721)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
>     at org.apache.ignite.Ignition.start(Ignition.java:353)
>     at 
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:367)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start 
> SPI: TcpDiscoverySpi [addrRslvr=null, addressFilter=null, sockTimeout=5000, 
> ackTimeout=5000, marsh=JdkMarshaller 
> [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@21f9277b], 
> reconCnt=10, reconDelay=2000, maxAckTimeout=60, soLinger=0, 
> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, 
> skipAddrsRandomization=false]
>     at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:281)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:980)
>     at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1985)
>     ... 11 more
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
> ID was found in node IDs history or existing node in topology has the same ID 
> (fix configu

[jira] [Comment Edited] (IGNITE-15996) Node fails with "Node with the same ID was found" while connecting to the cluster in Docker container if previous container was stopped

2021-12-07 Thread Ksenia Rybakova (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454638#comment-17454638
 ] 

Ksenia Rybakova edited comment on IGNITE-15996 at 12/7/21, 2:09 PM:


The issue is reproduced only when running containers with default network 
(without --network specified) and only on Windows systems. 3rd node can not 
join the cluster even if 2nd node was not stopped.
All required ports (47100, 47500) are open at all nodes. Nodes can reach each 
other. Netstat shows that TCP connection to TcpDiscoverySpi port is 
established, but then for some reason it's reset. 
TCP traffic analysis didn't make the reason of such behavior more clear.
As a workaround a user-defined network should be created before running 
containers:
{noformat}
docker network create my-net{noformat}
and then run ignite containers at this network
{noformat}
docker run -d --net my-net apacheignite/ignite{noformat}
Docker doc reference [here|https://docs.docker.com/network/bridge/]

As a resolution suggest adding corresponding info to [documentation 
|https://ignite.apache.org/docs/latest/installation/installing-using-docker](recommendation
 to use user-defined network when running containers on Windows).

 


was (Author: krybakova):
The issue is reproduced only when running containers with default network 
(without --network specified) and only on Windows systems. 3rd node can not 
join the cluster even if 2nd node was not stopped.
As a workaround an user-defined network should be created before running 
containers:
{noformat}
docker network create my-net{noformat}
and then run ignite containers at this network
{noformat}
docker run -d --net my-net apacheignite/ignite{noformat}
Docker doc reference [here|https://docs.docker.com/network/bridge/]

As a resolution suggest adding corresponding info to [documentation 
|https://ignite.apache.org/docs/latest/installation/installing-using-docker](recommendation
 to use user-defined network when running containers on Windows).

 

> Node fails with "Node with the same ID was found" while connecting to the 
> cluster in Docker container if previous container was stopped
> ---
>
> Key: IGNITE-15996
> URL: https://issues.apache.org/jira/browse/IGNITE-15996
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.10
> Environment: Windows 10, Docker+WSL2
>Reporter: Ksenia Rybakova
>Priority: Major
> Attachments: ignite-47b5227b.0.log, ignite-c072978e.0.log, 
> ignite-c62bc58e.0.log
>
>
> Node in Docker container fails to connect to existing cluster if previously 
> connected node (container) was stopped:
> {noformat}
> [11:27:38,272][SEVERE][main][IgniteKernal] Got exception while starting (will 
> rollback startup routine).
> class org.apache.ignite.IgniteCheckedException: Failed to start manager: 
> GridManagerAdapter [enabled=true, 
> name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
>     at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1990)
>     at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1331)
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2141)
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1787)
>     at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1172)
>     at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1066)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:952)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:851)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:721)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
>     at org.apache.ignite.Ignition.start(Ignition.java:353)
>     at 
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:367)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start 
> SPI: TcpDiscoverySpi [addrRslvr=null, addressFilter=null, sockTimeout=5000, 
> ackTimeout=5000, marsh=JdkMarshaller 
> [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@21f9277b], 
> reconCnt=10, reconDelay=2000, maxAckTimeout=60, soLinger=0, 
> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, 
> skipAddrsRandomization=false]
>     at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:281)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:980)
>     at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1985)
>     ... 11 more

[jira] [Comment Edited] (IGNITE-15996) Node fails with "Node with the same ID was found" while connecting to the cluster in Docker container if previous container was stopped

2021-12-07 Thread Ksenia Rybakova (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454638#comment-17454638
 ] 

Ksenia Rybakova edited comment on IGNITE-15996 at 12/7/21, 1:03 PM:


The issue is reproduced only when running containers with default network 
(without --network specified) and only on Windows systems. 3rd node can not 
join the cluster even if 2nd node was not stopped.
As a workaround an user-defined network should be created before running 
containers:
{noformat}
docker network create my-net{noformat}
and then run ignite containers at this network
{noformat}
docker run -d --net my-net apacheignite/ignite{noformat}
Docker doc reference [here|https://docs.docker.com/network/bridge/]

As a resolution suggest adding corresponding info to [documentation 
|https://ignite.apache.org/docs/latest/installation/installing-using-docker](recommendation
 to use user-defined network when running containers on Windows).

 


was (Author: krybakova):
The issue is reproduced only when running containers with default network 
(without --network specified) and only on Windows systems. 3rd node can not 
join the cluster even if 2nd node was not stopped.
As a workaround an user-defined network should be created before running 
containers:

{{docker network create my-net}}

and then run ignite containers at this network

{{docker run -d --net my-net apacheignite/ignite}}

Docker doc reference 
[[here|https://docs.docker.com/network/bridge/]|https://docs.docker.com/network/bridge/]

As a resolution suggest adding corresponding info to [documentation 
|[https://ignite.apache.org/docs/latest/installation/installing-using-docker]](recommendation
 to use user-defined network when running containers on Windows). 




 

> Node fails with "Node with the same ID was found" while connecting to the 
> cluster in Docker container if previous container was stopped
> ---
>
> Key: IGNITE-15996
> URL: https://issues.apache.org/jira/browse/IGNITE-15996
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.10
> Environment: Windows 10, Docker+WSL2
>Reporter: Ksenia Rybakova
>Priority: Major
> Attachments: ignite-47b5227b.0.log, ignite-c072978e.0.log, 
> ignite-c62bc58e.0.log
>
>
> Node in Docker container fails to connect to existing cluster if previously 
> connected node (container) was stopped:
> {noformat}
> [11:27:38,272][SEVERE][main][IgniteKernal] Got exception while starting (will 
> rollback startup routine).
> class org.apache.ignite.IgniteCheckedException: Failed to start manager: 
> GridManagerAdapter [enabled=true, 
> name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
>     at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1990)
>     at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1331)
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2141)
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1787)
>     at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1172)
>     at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1066)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:952)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:851)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:721)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
>     at org.apache.ignite.Ignition.start(Ignition.java:353)
>     at 
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:367)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start 
> SPI: TcpDiscoverySpi [addrRslvr=null, addressFilter=null, sockTimeout=5000, 
> ackTimeout=5000, marsh=JdkMarshaller 
> [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@21f9277b], 
> reconCnt=10, reconDelay=2000, maxAckTimeout=60, soLinger=0, 
> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, 
> skipAddrsRandomization=false]
>     at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:281)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:980)
>     at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1985)
>     ... 11 more
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
> ID was found in node IDs history or existing node in topology has the same ID 
> (fix configuration and restart local node) [localNode=TcpDiscoveryNode 
> [id=c62bc58e-