subject:"Ignite cluster going down frequently"

Re: Ignite cluster going down frequently

2018-11-30 Thread Ilya Kasnacheev

Hello!

[04:45:53,179][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Timed out waiting for message delivery receipt (most probably, the reason
is in long GC pauses on remote
 node; consider tuning GC and increasing 'ackTimeout' configuration
property). Will retry to send message with increased timeout
[currentTimeout=1, rmtAddr=/10.201.30.64:47603, rmtPort=
47603]
[04:45:53,180][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Failed to send message to next node [msg=TcpDiscoveryJoinRequestMessage
[node=TcpDiscoveryNode [id=47aa2
976-0a02-4ffe-9c8d-3f0fbfcc532b, addrs=[10.201.30.173], sockAddrs=[/
10.201.30.173:0], discPort=0, order=0, intOrder=0,
lastExchangeTime=1542861943131, loc=false, ver=2.4.0#20180305-sha1:aa3
42270, isClient=true],
dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@6ce6ae2,
super=TcpDiscoveryAbstractMessage
[sndNodeId=8a825790-a987-42c3-acb0-b3ea270143e1, id=5e14ec5
3761-47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b, verifierNodeId=null, topVer=0,
pendingIdx=0, failedNodes=null, isClient=true]], next=TcpDiscoveryNode
[id=d7782a2e-4cfc-4427-8ba7-a9af3954ae3f, ad
drs=[10.201.30.64], sockAddrs=[/10.201.30.64:47603], discPort=47603,
order=53, intOrder=32, lastExchangeTime=1542272829304, loc=false,
ver=2.4.0#20180305-sha1:aa342270, isClient=false], err
Msg=Failed to send message to next node [msg=TcpDiscoveryJoinRequestMessage
[node=TcpDiscoveryNode [id=47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b,
addrs=[10.201.30.173], sockAddrs=[/10.201.30.173
:0], discPort=0, order=0, intOrder=0, lastExchangeTime=1542861943131,
loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true],
dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataP
acket@6ce6ae2, super=TcpDiscoveryAbstractMessage
[sndNodeId=8a825790-a987-42c3-acb0-b3ea270143e1,
id=5e14ec53761-47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b, verifierNodeId=null,
topVer=0, pending
Idx=0, failedNodes=null, isClient=true]], next=ClusterNode
[id=d7782a2e-4cfc-4427-8ba7-a9af3954ae3f, order=53, addr=[10.201.30.64],
daemon=true]]]
[04:45:53,190][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Local node has detected failed nodes and started cluster-wide procedure. To
speed up failure detection p
lease see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi'

and then, on another node:
[04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode
[id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
isClient=false]

I think that you either have long GC pauses or flaky network (or system
goes into swapping and such).

Consider increasing 'ackTimeout' and/or 'failureDetectionTimeout'. Also
consider collecting GC logs for your nodes, looking into them for a root
cause.

Regards,
-- 
Ilya Kasnacheev

пт, 30 нояб. 2018 г. в 14:01, Hemasundara Rao <
hemasundara@travelcentrictechnology.com>:

> Hi Ilya Kasnacheev,
>
>  I am attaching all logs from second server (10.201.30.64).
> Please let me know if you need any other details.
>
> Thanks and Regards,
> Hemasundar.
>
> On Fri, 30 Nov 2018 at 09:40, Hemasundara Rao <
> hemasundara@travelcentrictechnology.com> wrote:
>
>> Hi Ilya Kasnacheev,
>>
>>   We are running one cluster node (10.201.30.63). I am attaching all logs
>> from this server.
>> Please let me know if you need any other details.
>>
>> Thanks and Regards,
>> Hemasundar.
>>
>>
>> On Thu, 29 Nov 2018 at 20:07, Ilya Kasnacheev 
>> wrote:
>>
>>> Hello!
>>>
>>> It is not clear from this log alone why this node became segmented. Do
>>> you have log from other server node in the topology? It was coordinator so
>>> maybe it was the one experiencing problems.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> ср, 28 нояб. 2018 г. в 13:56, Hemasundara Rao <
>>> hemasundara@travelcentrictechnology.com>:
>>>
 Hi  Ilya Kasnacheev,

  Did you get a chance to go though the log attached?
 This is one of the critical issue we are facing in our dev environment.
 Your input is of great help for us if we get, what is causing this
 issue and a probable solution to it.

 Thanks and Regards,
 Hemasundar.

 On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao <
 hemasundara@travelcentrictechnology.com> wrote:

> Hi  Ilya Kasnacheev,
>   I have attached the log file.
>
> Regards,
> Hemasundar.
>
> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev <
> ilya.kasnach...@gmail.com> wrote:
>
>> Hello!
>>
>> Maybe you have some data in your caches which causes runaway heap
>> usage in your own code. Previously you did not have such data or code 
>> which
>> would react in such fashion.
>>
>> It's hard to say, can you provide more logs from the node before i

Re: Ignite cluster going down frequently

2018-11-29 Thread Ilya Kasnacheev

Hello!

It is not clear from this log alone why this node became segmented. Do you
have log from other server node in the topology? It was coordinator so
maybe it was the one experiencing problems.

Regards,
-- 
Ilya Kasnacheev


ср, 28 нояб. 2018 г. в 13:56, Hemasundara Rao <
hemasundara@travelcentrictechnology.com>:

> Hi  Ilya Kasnacheev,
>
>  Did you get a chance to go though the log attached?
> This is one of the critical issue we are facing in our dev environment.
> Your input is of great help for us if we get, what is causing this issue
> and a probable solution to it.
>
> Thanks and Regards,
> Hemasundar.
>
> On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao <
> hemasundara@travelcentrictechnology.com> wrote:
>
>> Hi  Ilya Kasnacheev,
>>   I have attached the log file.
>>
>> Regards,
>> Hemasundar.
>>
>> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev 
>> wrote:
>>
>>> Hello!
>>>
>>> Maybe you have some data in your caches which causes runaway heap usage
>>> in your own code. Previously you did not have such data or code which would
>>> react in such fashion.
>>>
>>> It's hard to say, can you provide more logs from the node before it
>>> segments?
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
>>> hemasundara@travelcentrictechnology.com>:
>>>
 Thank you very much Ilya Kasnacheev for your response.

 We are loading data initially, after that only small delta change will
 be updated.
 Grid down issue is happening after it is running successfully 2 to 3
 days.
 Once the issue started, it is repeating frequently and not getting any
 clue.

 Thanks and Regards,
 Hemasundar.


 On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev <
 ilya.kasnach...@gmail.com> wrote:

> Hello!
>
> Node will get segmented if other nodes fail to wait for Discovery
> response from that node. This usually means either network problems or 
> long
> GC pauses causes by insufficient heap on one of nodes.
>
> Make sure your data load process does not cause heap usage spikes.
>
> Regards.
> --
> Ilya Kasnacheev
>
>
> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
> hemasundara@travelcentrictechnology.com>:
>
>> Hi All,
>> We are running two node ignite server cluster.
>> It was running without any issue for almost 5 days. We are using this
>> grid for static data. Ignite process is running with around 8GB memory
>> after we load our data.
>> Suddenly grid server nodes going down , we tried 3 times running the
>> server nodes and loading static data. Those server node going down again
>> and again.
>>
>> Please let us know how to overcome these kind of issue.
>>
>> Attache the log file and configuration file.
>>
>> *Following Is the part of log from on server : *
>>
>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Node is out of topology (probably, due to short-time network problems).
>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Local node SEGMENTED: TcpDiscoveryNode
>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], 
>> sockAddrs=[/
>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>> lastExchangeTime=1542861958327, loc=true, 
>> ver=2.4.0#20180305-sha1:aa342270,
>> isClient=false]
>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
>> rmtPort=36695
>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
>> rmtPort=58418
>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
>> rmtPort=63403
>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, 
>> res=true,
>> time=49ms]
>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, 
>> res=false,
>> time=7ms]
>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
>> rmtPort=48038
>> [04:46:08,367][WARNING][disco-event-wor

Re: Ignite cluster going down frequently

2018-11-28 Thread Hemasundara Rao

Hi  Ilya Kasnacheev,

 Did you get a chance to go though the log attached?
This is one of the critical issue we are facing in our dev environment.
Your input is of great help for us if we get, what is causing this issue
and a probable solution to it.

Thanks and Regards,
Hemasundar.

On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao <
hemasundara@travelcentrictechnology.com> wrote:

> Hi  Ilya Kasnacheev,
>   I have attached the log file.
>
> Regards,
> Hemasundar.
>
> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> Maybe you have some data in your caches which causes runaway heap usage
>> in your own code. Previously you did not have such data or code which would
>> react in such fashion.
>>
>> It's hard to say, can you provide more logs from the node before it
>> segments?
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
>> hemasundara@travelcentrictechnology.com>:
>>
>>> Thank you very much Ilya Kasnacheev for your response.
>>>
>>> We are loading data initially, after that only small delta change will
>>> be updated.
>>> Grid down issue is happening after it is running successfully 2 to 3
>>> days.
>>> Once the issue started, it is repeating frequently and not getting any
>>> clue.
>>>
>>> Thanks and Regards,
>>> Hemasundar.
>>>
>>>
>>> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev 
>>> wrote:
>>>
 Hello!

 Node will get segmented if other nodes fail to wait for Discovery
 response from that node. This usually means either network problems or long
 GC pauses causes by insufficient heap on one of nodes.

 Make sure your data load process does not cause heap usage spikes.

 Regards.
 --
 Ilya Kasnacheev


 пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
 hemasundara@travelcentrictechnology.com>:

> Hi All,
> We are running two node ignite server cluster.
> It was running without any issue for almost 5 days. We are using this
> grid for static data. Ignite process is running with around 8GB memory
> after we load our data.
> Suddenly grid server nodes going down , we tried 3 times running the
> server nodes and loading static data. Those server node going down again
> and again.
>
> Please let us know how to overcome these kind of issue.
>
> Attache the log file and configuration file.
>
> *Following Is the part of log from on server : *
>
> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Node is out of topology (probably, due to short-time network problems).
> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Local node SEGMENTED: TcpDiscoveryNode
> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], 
> sockAddrs=[/
> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
> lastExchangeTime=1542861958327, loc=true, 
> ver=2.4.0#20180305-sha1:aa342270,
> isClient=false]
> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
> rmtPort=36695
> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
> rmtPort=58418
> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
> rmtPort=63403
> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
> time=49ms]
> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, 
> res=false,
> time=7ms]
> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
> rmtPort=48038
> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Restarting JVM according to configured segmentation policy.
> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>>>

Re: Ignite cluster going down frequently

2018-11-26 Thread Ilya Kasnacheev

Hello!

Maybe you have some data in your caches which causes runaway heap usage in
your own code. Previously you did not have such data or code which would
react in such fashion.

It's hard to say, can you provide more logs from the node before it
segments?

Regards,
-- 
Ilya Kasnacheev


пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
hemasundara@travelcentrictechnology.com>:

> Thank you very much Ilya Kasnacheev for your response.
>
> We are loading data initially, after that only small delta change will be
> updated.
> Grid down issue is happening after it is running successfully 2 to 3 days.
> Once the issue started, it is repeating frequently and not getting any
> clue.
>
> Thanks and Regards,
> Hemasundar.
>
>
> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> Node will get segmented if other nodes fail to wait for Discovery
>> response from that node. This usually means either network problems or long
>> GC pauses causes by insufficient heap on one of nodes.
>>
>> Make sure your data load process does not cause heap usage spikes.
>>
>> Regards.
>> --
>> Ilya Kasnacheev
>>
>>
>> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
>> hemasundara@travelcentrictechnology.com>:
>>
>>> Hi All,
>>> We are running two node ignite server cluster.
>>> It was running without any issue for almost 5 days. We are using this
>>> grid for static data. Ignite process is running with around 8GB memory
>>> after we load our data.
>>> Suddenly grid server nodes going down , we tried 3 times running the
>>> server nodes and loading static data. Those server node going down again
>>> and again.
>>>
>>> Please let us know how to overcome these kind of issue.
>>>
>>> Attache the log file and configuration file.
>>>
>>> *Following Is the part of log from on server : *
>>>
>>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Node is out of topology (probably, due to short-time network problems).
>>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Local node SEGMENTED: TcpDiscoveryNode
>>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
>>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
>>> isClient=false]
>>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
>>> rmtPort=36695
>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
>>> rmtPort=58418
>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
>>> rmtPort=63403
>>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
>>> time=49ms]
>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
>>> time=7ms]
>>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
>>> rmtPort=48038
>>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Restarting JVM according to configured segmentation policy.
>>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
>>> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
>>> ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
>>> heap=84.0GB]
>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Data Regions Configured:
>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>> persistenceEnabled=false]
>>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>>> allowMerge=true]
>>> [04

Re: Ignite cluster going down frequently

2018-11-26 Thread Hemasundara Rao

Thank you very much Ilya Kasnacheev for your response.

We are loading data initially, after that only small delta change will be
updated.
Grid down issue is happening after it is running successfully 2 to 3 days.
Once the issue started, it is repeating frequently and not getting any clue.

Thanks and Regards,
Hemasundar.


On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev 
wrote:

> Hello!
>
> Node will get segmented if other nodes fail to wait for Discovery response
> from that node. This usually means either network problems or long GC
> pauses causes by insufficient heap on one of nodes.
>
> Make sure your data load process does not cause heap usage spikes.
>
> Regards.
> --
> Ilya Kasnacheev
>
>
> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
> hemasundara@travelcentrictechnology.com>:
>
>> Hi All,
>> We are running two node ignite server cluster.
>> It was running without any issue for almost 5 days. We are using this
>> grid for static data. Ignite process is running with around 8GB memory
>> after we load our data.
>> Suddenly grid server nodes going down , we tried 3 times running the
>> server nodes and loading static data. Those server node going down again
>> and again.
>>
>> Please let us know how to overcome these kind of issue.
>>
>> Attache the log file and configuration file.
>>
>> *Following Is the part of log from on server : *
>>
>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Node is out of topology (probably, due to short-time network problems).
>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Local node SEGMENTED: TcpDiscoveryNode
>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
>> isClient=false]
>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
>> rmtPort=36695
>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
>> rmtPort=58418
>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
>> rmtPort=63403
>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
>> time=49ms]
>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
>> time=7ms]
>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
>> rmtPort=48038
>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Restarting JVM according to configured segmentation policy.
>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
>> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
>> ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
>> heap=84.0GB]
>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Data Regions Configured:
>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>> persistenceEnabled=false]
>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>> allowMerge=true]
>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>> Finished waiting for partition release future
>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
>> futInfo=NA]
>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
>> [topVer=680, minorTopVer=0]]
>> [04:46:08,398][INFO][exchange-worker-#42%StaticGri

Re: Ignite cluster going down frequently

2018-11-26 Thread Ilya Kasnacheev

Hello!

Node will get segmented if other nodes fail to wait for Discovery response
from that node. This usually means either network problems or long GC
pauses causes by insufficient heap on one of nodes.

Make sure your data load process does not cause heap usage spikes.

Regards.
-- 
Ilya Kasnacheev


пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
hemasundara@travelcentrictechnology.com>:

> Hi All,
> We are running two node ignite server cluster.
> It was running without any issue for almost 5 days. We are using this grid
> for static data. Ignite process is running with around 8GB memory after we
> load our data.
> Suddenly grid server nodes going down , we tried 3 times running the
> server nodes and loading static data. Those server node going down again
> and again.
>
> Please let us know how to overcome these kind of issue.
>
> Attache the log file and configuration file.
>
> *Following Is the part of log from on server : *
>
> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Node is out of topology (probably, due to short-time network problems).
> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Local node SEGMENTED: TcpDiscoveryNode
> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
> isClient=false]
> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
> rmtPort=36695
> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
> rmtPort=58418
> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
> rmtPort=63403
> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
> time=49ms]
> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
> time=7ms]
> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
> rmtPort=48038
> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Restarting JVM according to configured segmentation policy.
> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false]
> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
> heap=84.0GB]
> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Data Regions Configured:
> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
> persistenceEnabled=false]
> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time] Started
> exchange init [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0],
> crd=true, evt=NODE_FAILED, evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
> customEvt=null, allowMerge=true]
> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
> Finished waiting for partition release future
> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
> futInfo=NA]
> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
> [topVer=680, minorTopVer=0]]
> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
> finishExchang

Re: Ignite cluster going down frequently

Re: Ignite cluster going down frequently

Re: Ignite cluster going down frequently

Re: Ignite cluster going down frequently

Re: Ignite cluster going down frequently

Re: Ignite cluster going down frequently

6 matches

Site Navigation

Mail list logo

Footer information