Re: Ignite cluster going down frequently
Hello! [04:45:53,179][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=1, rmtAddr=/10.201.30.64:47603, rmtPort= 47603] [04:45:53,180][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] Failed to send message to next node [msg=TcpDiscoveryJoinRequestMessage [node=TcpDiscoveryNode [id=47aa2 976-0a02-4ffe-9c8d-3f0fbfcc532b, addrs=[10.201.30.173], sockAddrs=[/ 10.201.30.173:0], discPort=0, order=0, intOrder=0, lastExchangeTime=1542861943131, loc=false, ver=2.4.0#20180305-sha1:aa3 42270, isClient=true], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@6ce6ae2, super=TcpDiscoveryAbstractMessage [sndNodeId=8a825790-a987-42c3-acb0-b3ea270143e1, id=5e14ec5 3761-47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]], next=TcpDiscoveryNode [id=d7782a2e-4cfc-4427-8ba7-a9af3954ae3f, ad drs=[10.201.30.64], sockAddrs=[/10.201.30.64:47603], discPort=47603, order=53, intOrder=32, lastExchangeTime=1542272829304, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], err Msg=Failed to send message to next node [msg=TcpDiscoveryJoinRequestMessage [node=TcpDiscoveryNode [id=47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b, addrs=[10.201.30.173], sockAddrs=[/10.201.30.173 :0], discPort=0, order=0, intOrder=0, lastExchangeTime=1542861943131, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataP acket@6ce6ae2, super=TcpDiscoveryAbstractMessage [sndNodeId=8a825790-a987-42c3-acb0-b3ea270143e1, id=5e14ec53761-47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b, verifierNodeId=null, topVer=0, pending Idx=0, failedNodes=null, isClient=true]], next=ClusterNode [id=d7782a2e-4cfc-4427-8ba7-a9af3954ae3f, order=53, addr=[10.201.30.64], daemon=true]]] [04:45:53,190][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection p lease see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi' and then, on another node: [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/ 10.201.30.63:47600], discPort=47600, order=42, intOrder=23, lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false] I think that you either have long GC pauses or flaky network (or system goes into swapping and such). Consider increasing 'ackTimeout' and/or 'failureDetectionTimeout'. Also consider collecting GC logs for your nodes, looking into them for a root cause. Regards, -- Ilya Kasnacheev пт, 30 нояб. 2018 г. в 14:01, Hemasundara Rao < hemasundara@travelcentrictechnology.com>: > Hi Ilya Kasnacheev, > > I am attaching all logs from second server (10.201.30.64). > Please let me know if you need any other details. > > Thanks and Regards, > Hemasundar. > > On Fri, 30 Nov 2018 at 09:40, Hemasundara Rao < > hemasundara@travelcentrictechnology.com> wrote: > >> Hi Ilya Kasnacheev, >> >> We are running one cluster node (10.201.30.63). I am attaching all logs >> from this server. >> Please let me know if you need any other details. >> >> Thanks and Regards, >> Hemasundar. >> >> >> On Thu, 29 Nov 2018 at 20:07, Ilya Kasnacheev >> wrote: >> >>> Hello! >>> >>> It is not clear from this log alone why this node became segmented. Do >>> you have log from other server node in the topology? It was coordinator so >>> maybe it was the one experiencing problems. >>> >>> Regards, >>> -- >>> Ilya Kasnacheev >>> >>> >>> ср, 28 нояб. 2018 г. в 13:56, Hemasundara Rao < >>> hemasundara@travelcentrictechnology.com>: >>> Hi Ilya Kasnacheev, Did you get a chance to go though the log attached? This is one of the critical issue we are facing in our dev environment. Your input is of great help for us if we get, what is causing this issue and a probable solution to it. Thanks and Regards, Hemasundar. On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao < hemasundara@travelcentrictechnology.com> wrote: > Hi Ilya Kasnacheev, > I have attached the log file. > > Regards, > Hemasundar. > > On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev < > ilya.kasnach...@gmail.com> wrote: > >> Hello! >> >> Maybe you have some data in your caches which causes runaway heap >> usage in your own code. Previously you did not have such data or code >> which >> would react in such fashion. >> >> It's hard to say, can you provide more logs from the node before i
Re: Ignite cluster going down frequently
Hello! It is not clear from this log alone why this node became segmented. Do you have log from other server node in the topology? It was coordinator so maybe it was the one experiencing problems. Regards, -- Ilya Kasnacheev ср, 28 нояб. 2018 г. в 13:56, Hemasundara Rao < hemasundara@travelcentrictechnology.com>: > Hi Ilya Kasnacheev, > > Did you get a chance to go though the log attached? > This is one of the critical issue we are facing in our dev environment. > Your input is of great help for us if we get, what is causing this issue > and a probable solution to it. > > Thanks and Regards, > Hemasundar. > > On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao < > hemasundara@travelcentrictechnology.com> wrote: > >> Hi Ilya Kasnacheev, >> I have attached the log file. >> >> Regards, >> Hemasundar. >> >> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev >> wrote: >> >>> Hello! >>> >>> Maybe you have some data in your caches which causes runaway heap usage >>> in your own code. Previously you did not have such data or code which would >>> react in such fashion. >>> >>> It's hard to say, can you provide more logs from the node before it >>> segments? >>> >>> Regards, >>> -- >>> Ilya Kasnacheev >>> >>> >>> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao < >>> hemasundara@travelcentrictechnology.com>: >>> Thank you very much Ilya Kasnacheev for your response. We are loading data initially, after that only small delta change will be updated. Grid down issue is happening after it is running successfully 2 to 3 days. Once the issue started, it is repeating frequently and not getting any clue. Thanks and Regards, Hemasundar. On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev < ilya.kasnach...@gmail.com> wrote: > Hello! > > Node will get segmented if other nodes fail to wait for Discovery > response from that node. This usually means either network problems or > long > GC pauses causes by insufficient heap on one of nodes. > > Make sure your data load process does not cause heap usage spikes. > > Regards. > -- > Ilya Kasnacheev > > > пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao < > hemasundara@travelcentrictechnology.com>: > >> Hi All, >> We are running two node ignite server cluster. >> It was running without any issue for almost 5 days. We are using this >> grid for static data. Ignite process is running with around 8GB memory >> after we load our data. >> Suddenly grid server nodes going down , we tried 3 times running the >> server nodes and loading static data. Those server node going down again >> and again. >> >> Please let us know how to overcome these kind of issue. >> >> Attache the log file and configuration file. >> >> *Following Is the part of log from on server : * >> >> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Node is out of topology (probably, due to short-time network problems). >> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >> Local node SEGMENTED: TcpDiscoveryNode >> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], >> sockAddrs=[/ >> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23, >> lastExchangeTime=1542861958327, loc=true, >> ver=2.4.0#20180305-sha1:aa342270, >> isClient=false] >> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695, >> rmtPort=36695 >> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418, >> rmtPort=58418 >> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403, >> rmtPort=63403 >> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e >> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, >> res=true, >> time=49ms] >> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3 >> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, >> res=false, >> time=7ms] >> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038, >> rmtPort=48038 >> [04:46:08,367][WARNING][disco-event-wor
Re: Ignite cluster going down frequently
Hi Ilya Kasnacheev, Did you get a chance to go though the log attached? This is one of the critical issue we are facing in our dev environment. Your input is of great help for us if we get, what is causing this issue and a probable solution to it. Thanks and Regards, Hemasundar. On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao < hemasundara@travelcentrictechnology.com> wrote: > Hi Ilya Kasnacheev, > I have attached the log file. > > Regards, > Hemasundar. > > On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev > wrote: > >> Hello! >> >> Maybe you have some data in your caches which causes runaway heap usage >> in your own code. Previously you did not have such data or code which would >> react in such fashion. >> >> It's hard to say, can you provide more logs from the node before it >> segments? >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao < >> hemasundara@travelcentrictechnology.com>: >> >>> Thank you very much Ilya Kasnacheev for your response. >>> >>> We are loading data initially, after that only small delta change will >>> be updated. >>> Grid down issue is happening after it is running successfully 2 to 3 >>> days. >>> Once the issue started, it is repeating frequently and not getting any >>> clue. >>> >>> Thanks and Regards, >>> Hemasundar. >>> >>> >>> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev >>> wrote: >>> Hello! Node will get segmented if other nodes fail to wait for Discovery response from that node. This usually means either network problems or long GC pauses causes by insufficient heap on one of nodes. Make sure your data load process does not cause heap usage spikes. Regards. -- Ilya Kasnacheev пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao < hemasundara@travelcentrictechnology.com>: > Hi All, > We are running two node ignite server cluster. > It was running without any issue for almost 5 days. We are using this > grid for static data. Ignite process is running with around 8GB memory > after we load our data. > Suddenly grid server nodes going down , we tried 3 times running the > server nodes and loading static data. Those server node going down again > and again. > > Please let us know how to overcome these kind of issue. > > Attache the log file and configuration file. > > *Following Is the part of log from on server : * > > [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Node is out of topology (probably, due to short-time network problems). > [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] > Local node SEGMENTED: TcpDiscoveryNode > [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], > sockAddrs=[/ > 10.201.30.63:47600], discPort=47600, order=42, intOrder=23, > lastExchangeTime=1542861958327, loc=true, > ver=2.4.0#20180305-sha1:aa342270, > isClient=false] > [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished serving remote node connection [rmtAddr=/10.201.30.64:36695, > rmtPort=36695 > [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished serving remote node connection [rmtAddr=/10.201.30.172:58418, > rmtPort=58418 > [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished serving remote node connection [rmtAddr=/10.201.10.125:63403, > rmtPort=63403 > [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e > [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true, > time=49ms] > [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3 > [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, > res=false, > time=7ms] > [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished serving remote node connection [rmtAddr=/10.201.30.64:48038, > rmtPort=48038 > [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] > Restarting JVM according to configured segmentation policy. > [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] > Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, > addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], > discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642, > loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false] >>>
Re: Ignite cluster going down frequently
Hello! Maybe you have some data in your caches which causes runaway heap usage in your own code. Previously you did not have such data or code which would react in such fashion. It's hard to say, can you provide more logs from the node before it segments? Regards, -- Ilya Kasnacheev пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao < hemasundara@travelcentrictechnology.com>: > Thank you very much Ilya Kasnacheev for your response. > > We are loading data initially, after that only small delta change will be > updated. > Grid down issue is happening after it is running successfully 2 to 3 days. > Once the issue started, it is repeating frequently and not getting any > clue. > > Thanks and Regards, > Hemasundar. > > > On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev > wrote: > >> Hello! >> >> Node will get segmented if other nodes fail to wait for Discovery >> response from that node. This usually means either network problems or long >> GC pauses causes by insufficient heap on one of nodes. >> >> Make sure your data load process does not cause heap usage spikes. >> >> Regards. >> -- >> Ilya Kasnacheev >> >> >> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao < >> hemasundara@travelcentrictechnology.com>: >> >>> Hi All, >>> We are running two node ignite server cluster. >>> It was running without any issue for almost 5 days. We are using this >>> grid for static data. Ignite process is running with around 8GB memory >>> after we load our data. >>> Suddenly grid server nodes going down , we tried 3 times running the >>> server nodes and loading static data. Those server node going down again >>> and again. >>> >>> Please let us know how to overcome these kind of issue. >>> >>> Attache the log file and configuration file. >>> >>> *Following Is the part of log from on server : * >>> >>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>> Node is out of topology (probably, due to short-time network problems). >>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>> Local node SEGMENTED: TcpDiscoveryNode >>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/ >>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23, >>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270, >>> isClient=false] >>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695, >>> rmtPort=36695 >>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418, >>> rmtPort=58418 >>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403, >>> rmtPort=63403 >>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e >>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true, >>> time=49ms] >>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3 >>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false, >>> time=7ms] >>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038, >>> rmtPort=48038 >>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>> Restarting JVM according to configured segmentation policy. >>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, >>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601, >>> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false, >>> ver=2.4.0#20180305-sha1:aa342270, isClient=false] >>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB, >>> heap=84.0GB] >>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>> Data Regions Configured: >>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>> ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB, >>> persistenceEnabled=false] >>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time] >>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680, >>> minorTopVer=0], crd=true, evt=NODE_FAILED, >>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null, >>> allowMerge=true] >>> [04
Re: Ignite cluster going down frequently
Thank you very much Ilya Kasnacheev for your response. We are loading data initially, after that only small delta change will be updated. Grid down issue is happening after it is running successfully 2 to 3 days. Once the issue started, it is repeating frequently and not getting any clue. Thanks and Regards, Hemasundar. On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev wrote: > Hello! > > Node will get segmented if other nodes fail to wait for Discovery response > from that node. This usually means either network problems or long GC > pauses causes by insufficient heap on one of nodes. > > Make sure your data load process does not cause heap usage spikes. > > Regards. > -- > Ilya Kasnacheev > > > пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao < > hemasundara@travelcentrictechnology.com>: > >> Hi All, >> We are running two node ignite server cluster. >> It was running without any issue for almost 5 days. We are using this >> grid for static data. Ignite process is running with around 8GB memory >> after we load our data. >> Suddenly grid server nodes going down , we tried 3 times running the >> server nodes and loading static data. Those server node going down again >> and again. >> >> Please let us know how to overcome these kind of issue. >> >> Attache the log file and configuration file. >> >> *Following Is the part of log from on server : * >> >> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Node is out of topology (probably, due to short-time network problems). >> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >> Local node SEGMENTED: TcpDiscoveryNode >> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/ >> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23, >> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270, >> isClient=false] >> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695, >> rmtPort=36695 >> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418, >> rmtPort=58418 >> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403, >> rmtPort=63403 >> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e >> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true, >> time=49ms] >> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3 >> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false, >> time=7ms] >> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi] >> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038, >> rmtPort=48038 >> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >> Restarting JVM according to configured segmentation policy. >> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, >> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601, >> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false, >> ver=2.4.0#20180305-sha1:aa342270, isClient=false] >> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB, >> heap=84.0GB] >> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >> Data Regions Configured: >> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >> ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB, >> persistenceEnabled=false] >> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time] >> Started exchange init [topVer=AffinityTopologyVersion [topVer=680, >> minorTopVer=0], crd=true, evt=NODE_FAILED, >> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null, >> allowMerge=true] >> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture] >> Finished waiting for partition release future >> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms, >> futInfo=NA] >> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture] >> Coordinator received all messages, try merge [ver=AffinityTopologyVersion >> [topVer=680, minorTopVer=0]] >> [04:46:08,398][INFO][exchange-worker-#42%StaticGri
Re: Ignite cluster going down frequently
Hello! Node will get segmented if other nodes fail to wait for Discovery response from that node. This usually means either network problems or long GC pauses causes by insufficient heap on one of nodes. Make sure your data load process does not cause heap usage spikes. Regards. -- Ilya Kasnacheev пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao < hemasundara@travelcentrictechnology.com>: > Hi All, > We are running two node ignite server cluster. > It was running without any issue for almost 5 days. We are using this grid > for static data. Ignite process is running with around 8GB memory after we > load our data. > Suddenly grid server nodes going down , we tried 3 times running the > server nodes and loading static data. Those server node going down again > and again. > > Please let us know how to overcome these kind of issue. > > Attache the log file and configuration file. > > *Following Is the part of log from on server : * > > [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Node is out of topology (probably, due to short-time network problems). > [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] > Local node SEGMENTED: TcpDiscoveryNode > [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/ > 10.201.30.63:47600], discPort=47600, order=42, intOrder=23, > lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270, > isClient=false] > [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished serving remote node connection [rmtAddr=/10.201.30.64:36695, > rmtPort=36695 > [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished serving remote node connection [rmtAddr=/10.201.30.172:58418, > rmtPort=58418 > [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished serving remote node connection [rmtAddr=/10.201.10.125:63403, > rmtPort=63403 > [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e > [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true, > time=49ms] > [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3 > [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false, > time=7ms] > [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi] > Finished serving remote node connection [rmtAddr=/10.201.30.64:48038, > rmtPort=48038 > [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] > Restarting JVM according to configured segmentation policy. > [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] > Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, > addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601, > order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false] > [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] > Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB, > heap=84.0GB] > [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] > Data Regions Configured: > [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] > ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB, > persistenceEnabled=false] > [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], > crd=true, evt=NODE_FAILED, evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, > customEvt=null, allowMerge=true] > [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture] > Finished waiting for partition release future > [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms, > futInfo=NA] > [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture] > Coordinator received all messages, try merge [ver=AffinityTopologyVersion > [topVer=680, minorTopVer=0]] > [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager] > Stop merge, custom task found: WalStateNodeLeaveExchangeTask > [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, > addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601, > order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false]] > [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture] > finishExchang