[jira] [Updated] (IGNITE-8633) Node fails to bail out of wrong BLT, instead hanging around indefinitely

Ilya Kasnacheev (JIRA) Mon, 28 May 2018 10:29:39 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ilya Kasnacheev updated IGNITE-8633:
------------------------------------
    Attachment: 8633.zip

> Node fails to bail out of wrong BLT, instead hanging around indefinitely
> ------------------------------------------------------------------------
>
>                 Key: IGNITE-8633
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8633
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.4
>            Reporter: Ilya Kasnacheev
>            Assignee: Stanislav Lukyanov
>            Priority: Major
>         Attachments: 8633.zip
>
>
> Follow-up on 
> https://stackoverflow.com/questions/50234056/how-to-give-multiple-static-ip-in-apache-ignite-cache-configuration-xml-file/50270676?noredirect=1#comment88095814_50270676
>  but not quite the same.
> I have three nodes: A, B and C.
> I've started A and C and performed activation.
> Then I stopped them both, started B and performed activation on it.
> Now I have two BlT clusters: (A, C) and (B)
> However, when I start B; and then try to launch nodes A or C I get 
> inconsistent behavior:
> When I launch C, I get the error:
> {code}
> org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node 
> (8c1e210f-52bb-424f-9c7c-a2e7b1bab546 ) is not compatible with 
> BaselineTopology in the cluster. Branching history of cluster BlT 
> ([-1349069127]) doesn't contain branching point hash of joining node BlT 
> (631694798). Consider cleaning persistent storage of the node and adding it 
> to the cluster again.
> {code}
> But when I launch A, it never enters topology, but also never fails. 
> Moreover, A and B will ping pong each other for eternity:
> {code}
> [20:16:38,596][WARNING][main][TcpDiscoverySpi] Node has not been connected to 
> topology and will repeat join process. Check remote nodes logs for possible 
> error messages. Note that large topology may require significant time to 
> start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if 
> getting this message on the starting nodes [networkTimeout=5000]
> [20:17:29,514][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery 
> accepted incoming connection [rmtAddr=/172.25.1.36, rmtPort=49030]
> [20:17:29,522][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery 
> spawning a new thread for connection [rmtAddr=/172.25.1.36, rmtPort=49030]
> [20:17:29,523][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Started 
> serving remote node connection [rmtAddr=/172.25.1.36:49030, rmtPort=49030]
> [20:17:29,524][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Received 
> ping request from the remote node 
> [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, rmtAddr=/172.25.1.36:49030, 
> rmtPort=49030]
> [20:17:29,525][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Finished 
> writing ping response [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, 
> rmtAddr=/172.25.1.36:49030, rmtPort=49030]
> [20:17:29,526][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Finished 
> serving remote node connection [rmtAddr=/172.25.1.36:49030, rmtPort=49030
> [20:18:30,733][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery 
> accepted incoming connection [rmtAddr=/172.25.1.36, rmtPort=50857]
> [20:18:30,733][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery 
> spawning a new thread for connection [rmtAddr=/172.25.1.36, rmtPort=50857]
> [20:18:30,733][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Started 
> serving remote node connection [rmtAddr=/172.25.1.36:50857, rmtPort=50857]
> [20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Received 
> ping request from the remote node 
> [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, rmtAddr=/172.25.1.36:50857, 
> rmtPort=50857]
> [20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Finished 
> writing ping response [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, 
> rmtAddr=/172.25.1.36:50857, rmtPort=50857]
> [20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Finished 
> serving remote node connection [rmtAddr=/172.25.1.36:50857, rmtPort=50857
> {code}
> {code}
> [20:16:28,793][INFO][tcp-disco-msg-worker-#3][GridSnapshotAwareClusterStateProcessorImpl]
>  Received state change finish message: true
> [20:16:28,803][INFO][exchange-worker-#62][time] Finished exchange init 
> [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], crd=true]
> [20:16:28,812][INFO][exchange-worker-#62][GridCachePartitionExchangeManager] 
> Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=1, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, 
> node=37104137-a21e-4b6f-a70b-09164300bbfc]
> [20:16:28,818][INFO][sys-#68][GridSnapshotAwareClusterStateProcessorImpl] 
> Successfully performed final activation steps 
> [nodeId=37104137-a21e-4b6f-a70b-09164300bbfc, client=false, 
> topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1]]
> [20:16:33,571][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery 
> accepted incoming connection [rmtAddr=/172.25.1.35, rmtPort=42500]
> [20:16:33,579][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery 
> spawning a new thread for connection [rmtAddr=/172.25.1.35, rmtPort=42500]
> [20:16:33,580][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Started 
> serving remote node connection [rmtAddr=/172.25.1.35:42500, rmtPort=42500]
> [20:16:33,592][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Finished 
> serving remote node connection [rmtAddr=/172.25.1.35:42500, rmtPort=42500
> [20:16:39,801][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery 
> accepted incoming connection [rmtAddr=/172.25.1.35, rmtPort=42714]
> [20:16:39,801][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery 
> spawning a new thread for connection [rmtAddr=/172.25.1.35, rmtPort=42714]
> [20:16:39,802][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Started 
> serving remote node connection [rmtAddr=/172.25.1.35:42714, rmtPort=42714]
> [20:16:39,806][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Finished 
> serving remote node connection [rmtAddr=/172.25.1.35:42714, rmtPort=42714
> {code}
> I don't think this is expected behaviour. I will attach config and work 
> directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (IGNITE-8633) Node fails to bail out of wrong BLT, instead hanging around indefinitely

Reply via email to