[ 
https://issues.apache.org/jira/browse/IGNITE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Plekhanov updated IGNITE-8785:
--------------------------------------
    Fix Version/s:     (was: 2.9)
                   2.10

> Node may hang indefinitely in CONNECTING state during cluster segmentation
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-8785
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8785
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.5
>            Reporter: Pavel Kovalenko
>            Priority: Major
>             Fix For: 2.10
>
>
> Affected test: 
> org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest#testTopologyValidatorWithCacheGroup
> Node hangs with following stacktrace:
> {noformat}
> "grid-starter-testTopologyValidatorWithCacheGroup-22" #117619 prio=5 
> os_prio=0 tid=0x00007f17dd19b800 nid=0x304a in Object.wait() 
> [0x00007f16b19df000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:931)
>       - locked <0x0000000705ee4a60> (a java.lang.Object)
>       at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:373)
>       at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1948)
>       at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
>       at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:915)
>       at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1739)
>       at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1046)
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014)
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723)
>       - locked <0x0000000705995ec0> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
>       at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151)
>       at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:649)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:882)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:845)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:833)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:799)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest$3.call(GridAbstractTest.java:742)
>       at 
> org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
> {noformat}
> It seems that node never receives acknowledgment from coordinator.
> There were some failure before:
> {noformat}
> [org.apache.ignite:ignite-core] [2018-06-10 04:59:18,876][WARN 
> ][grid-starter-testTopologyValidatorWithCacheGroup-22][IgniteCacheTopologySplitAbstractTest$SplitTcpDiscoverySpi]
>  Node has not been connected to topology and will repeat join process. Check 
> remote nodes logs for possible error messages. Note that large topology may 
> require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' 
> configuration property if getting this message on the starting nodes 
> [networkTimeout=5000]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to