Re: Nodes restarting automatically

David Pilato Thu, 29 May 2014 01:52:30 -0700

I think but might be wrong that this node as unresponsive does not collect 
anymore GC data.
May be you could look in the past before things starting to be worse.



--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 29 mai 2014 à 10:43, Jorge Ferrando <jorfe...@gmail.com> a écrit :

This is what Marvel shows for old GC in the last 6 hours for that node:

<image.png>


> On Thu, May 29, 2014 at 10:39 AM, David Pilato <da...@pilato.fr> wrote:
> It sounds like the old GC is not able to clean old gen space enough.
> I guess that if you look at your Marvel dashboards, you can see that on old 
> GC.
> 
> So memory pressure is the first guess. You may have too many old GC cycles.
> 
> 
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> 
> 
> Le 29 mai 2014 à 10:32, Jorge Ferrando <jorfe...@gmail.com> a écrit :
> 
> Thanks for the answer David
> 
> I added this setting to elasticsearch.yml some days ago to see if that what's 
> the problem:
> 
> discovery.zen.ping.timeout: 5s
> discovery.zen.fd.ping_interval: 5s
> discovery.zen.fd.ping_timeout: 60s
> discovery.zen.fd.ping_retries: 3
> 
> If I'm not mistaken, with those settings the node should be marked as 
> unavailable after 3m and most of the times it happens quicker. Am I wrong?
> 
> 
>> On Thu, May 29, 2014 at 10:29 AM, David Pilato <da...@pilato.fr> wrote:
>> GC took too much time so your node become unresponsive I think.
>> If you set 30 Gb RAM, you should increase the time out ping setting before a 
>> node is marked as unresponsive.
>> 
>> And if you are under memory pressure, you could try to check your requests 
>> and see if you can have some optimization or start new nodes...
>> 
>> My 2 cents.
>> 
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>> 
>> 
>> Le 29 mai 2014 à 09:56, Jorge Ferrando <jorfe...@gmail.com> a écrit :
>> 
>> I've been analyzing the problem with Marvel and nagios and I managed to get 
>> 2 more details:
>> 
>> - The node restarting/reinitializing it's always the same. Node 3
>> - It always happens quickly after getting the cluster in green state. 
>> Between some seconds and 2-3 minutes
>> 
>> I have debug mode on in logging.yml:
>> 
>> logger:
>>   # log action execution errors for easier debugging
>>   action: DEBUG
>> 
>> But i dont see anything in the log. For instance, this is the last time it 
>> happened at around 9:47 the cluster became green and 9:50 the node restarted
>> 
>> [2014-05-29 09:30:57,235][INFO ][monitor.jvm              ] [elastic ASIC 
>> nodo 3] [gc][young][129][20] duration [745ms], collections [1]/[1s], total 
>> [745ms]/[8.5s], memory [951.1mb]->[598.9mb]/[29.9gb], all_pools {[young] 
>> [421.5mb]->[8.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] 
>> [463.1mb]->[524.1mb]/[29.3gb]}
>> [2014-05-29 09:45:36,322][WARN ][monitor.jvm              ] [elastic ASIC 
>> nodo 3] [gc][old][964][1] duration [29.5s], collections [1]/[30.4s], total 
>> [29.5s]/[29.5s], memory [5.1gb]->[4.3gb]/[29.9gb], all_pools {[young] 
>> [29.4mb]->[34.9mb]/[532.5mb]}{[survivor] [59.9mb]->[0b]/[66.5mb]}{[old] 
>> [5gb]->[4.2gb]/[29.3gb]}
>> [2014-05-29 09:50:41,040][INFO ][node                     ] [elastic ASIC 
>> nodo 3] version[1.2.0], pid[7021], build[c82387f/2014-05-22T12:49:13Z]
>> [2014-05-29 09:50:41,041][INFO ][node                     ] [elastic ASIC 
>> nodo 3] initializing ...
>> [2014-05-29 09:50:41,063][INFO ][plugins                  ] [elastic ASIC 
>> nodo 3] loaded [marvel], sites [marvel, paramedic, inquisitor, HQ, bigdesk, 
>> head]
>> [2014-05-29 09:50:47,908][INFO ][node                     ] [elastic ASIC 
>> nodo 3] initialized
>> [2014-05-29 09:50:47,909][INFO ][node                     ] [elastic ASIC 
>> nodo 3] starting ...
>> 
>> ¿Is there any other way of debugging what's going on with that node? 
>> 
>> 
>> 
>> 
>>> On Tue, May 27, 2014 at 12:49 PM, Jorge Ferrando <jorfe...@gmail.com> wrote:
>>> I thought about that but It would be strange because they are 3 Virtual 
>>> Machines in the same VMWare cluster with other hundreds of services and 
>>> nobody reported any networking problem.
>>> 
>>> 
>>>> On Thu, May 22, 2014 at 3:16 PM, emeschitc <emesch...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> I may be wrong but it seems to me you have a problem with your network. It 
>>>> may be a flaky connection, broken nic or something wrong with your 
>>>> configuration for discovery and/or data transport ? 
>>>> 
>>>> Caused by: org.elasticsearch.transport.NodeNotConnectedException: [elastic 
>>>> ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected
>>>>    at 
>>>> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
>>>>    at 
>>>> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
>>>>    at 
>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
>>>> 
>>>> Check the status of the network on this node.
>>>> 
>>>> 
>>>> 
>>>>> On Thu, May 22, 2014 at 2:07 PM, Jorge Ferrando [via ElasticSearch Users] 
>>>>> <[hidden email]> wrote:
>>>>> Hello 
>>>>> 
>>>>> We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and 
>>>>> elasticsearch v1.1.1
>>>>> 
>>>>> It's be running flawlessly but since the last weak some of the nodes 
>>>>> restarts randomly and cluster gets to red state, then yellow, then green 
>>>>> and it happens again in a loop (sometimes it even doesnt get green state)
>>>>> 
>>>>> I've tried to look at the logs but i can't find and obvious reason of 
>>>>> what can be going on 
>>>>> 
>>>>> I've found entries like these, but I don't know if they are in some way 
>>>>> related to the crash:
>>>>> 
>>>>> [2014-05-22 13:55:16,150][WARN ][index.codec              ] [elastic ASIC 
>>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] 
>>>>> returning default postings format
>>>>> [2014-05-22 13:55:16,151][WARN ][index.codec              ] [elastic ASIC 
>>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: 
>>>>> [date_end.raw] returning default postings format
>>>>> [2014-05-22 13:55:16,151][WARN ][index.codec              ] [elastic ASIC 
>>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: 
>>>>> [date_start] returning default postings format
>>>>> [2014-05-22 13:55:16,151][WARN ][index.codec              ] [elastic ASIC 
>>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: 
>>>>> [date_start.raw] returning default postings format
>>>>> 
>>>>> 
>>>>> For instance right now it was in yellow state, really close to get to the 
>>>>> green state and suddenly node 3 autorestarted and now cluster is red with 
>>>>> 2000 shard initializing. The log in that node shows this:
>>>>> 
>>>>> [2014-05-22 13:59:48,498][INFO ][monitor.jvm              ] [elastic ASIC 
>>>>> nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], 
>>>>> total [735ms]/[1.1m], memory [6.5gb]->[6.1gb]/[19.9gb], all_pools 
>>>>> {[young] [456mb]->[7.2mb]/[532.5mb]}{[survivor] 
>>>>> [66.5mb]->[66.5mb]/[66.5mb]}{[old] [6gb]->[6gb]/[19.3gb]}
>>>>> [2014-05-22 14:03:44,825][INFO ][node                     ] [elastic ASIC 
>>>>> nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z]
>>>>> [2014-05-22 14:03:44,826][INFO ][node                     ] [elastic ASIC 
>>>>> nodo 3] initializing ...
>>>>> [2014-05-22 14:03:44,839][INFO ][plugins                  ] [elastic ASIC 
>>>>> nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head]
>>>>> [2014-05-22 14:03:51,967][INFO ][node                     ] [elastic ASIC 
>>>>> nodo 3] initialized
>>>>> [2014-05-22 14:03:51,967][INFO ][node                     ] [elastic ASIC 
>>>>> nodo 3] starting ...
>>>>> 
>>>>> The crash happened exactly at 14:02.
>>>>> 
>>>>> Any Idea what can be going on or how can I trace what's happening?
>>>>> 
>>>>> After rebooting there are also DEBUG errors like this:
>>>>> 
>>>>> [2014-05-22 14:06:16,621][DEBUG][action.search.type       ] [elastic ASIC 
>>>>> nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], 
>>>>> s[STARTED]: Failed to execute 
>>>>> [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true]
>>>>> org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC 
>>>>> nodo 2][inet[/158.42.250.79:9301]][search/phase/query]
>>>>>   at 
>>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
>>>>>   at 
>>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
>>>>>   at 
>>>>> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208)
>>>>>   at 
>>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
>>>>>   at 
>>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
>>>>>   at 
>>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
>>>>>   at 
>>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143)
>>>>>   at 
>>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59)
>>>>>   at 
>>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49)
>>>>>   at 
>>>>> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
>>>>>   at 
>>>>> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108)
>>>>>   at 
>>>>> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
>>>>>   at 
>>>>> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
>>>>>   at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92)
>>>>>   at 
>>>>> org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212)
>>>>>   at 
>>>>> org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98)
>>>>>   at 
>>>>> org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
>>>>>   at 
>>>>> org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
>>>>>   at 
>>>>> org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
>>>>>   at 
>>>>> org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
>>>>>   at 
>>>>> org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)
>>>>>   at 
>>>>> org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>>>>   at 
>>>>> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>>>>   at 
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>   at 
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>   at java.lang.Thread.run(Thread.java:744)
>>>>> Caused by: org.elasticsearch.transport.NodeNotConnectedException: 
>>>>> [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected
>>>>>   at 
>>>>> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
>>>>>   at 
>>>>> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
>>>>>   at 
>>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
>>>>>   ... 50 more
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to [hidden email].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/elasticsearch/fa53a41d-064b-4250-8003-31cf845b7216%40googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>>> 
>>>>> If you reply to this email, your message will be added to the discussion 
>>>>> below:
>>>>> http://elasticsearch-users.115913.n3.nabble.com/Nodes-restarting-automatically-tp4056276.html
>>>>> To unsubscribe from ElasticSearch Users, click here.
>>>>> NAML
>>>> 
>>>> 
>>>> View this message in context: Re: Nodes restarting automatically
>>>> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>>>> -- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "elasticsearch" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> elasticsearch+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/CAE6dBgjyXAM8ELYJ8AKAx6f5pSxri%3DNk1Oq%3Dx%3D5MCp5qYSzuug%40mail.gmail.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5ArT-7tCh_f%2B9XAH5UfnsjWaBrMG0sacqUrL7T6JV9r7Q%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/AA94DDC8-AC14-47E2-80D5-6B670FF8D9E7%40pilato.fr.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5CqL5ss7MbtO0L481XXkycTdz2qFSH%3DnPvu7P_W_3CiKg%40mail.gmail.com.
> 
> For more options, visit https://groups.google.com/d/optout.
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/F40FD3BA-135B-49B9-B2CF-0E68D58D9B5D%40pilato.fr.
> 
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5BzJJ3Hy0CJeJ_zXBSFt7iGRPav%2BSXN8KJ1-ixFNPviUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/332CAAEE-2BB9-46F9-A0E3-94D4AD30B21D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Nodes restarting automatically

Reply via email to