I think but might be wrong that this node as unresponsive does not collect anymore GC data. May be you could look in the past before things starting to be worse.
-- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 29 mai 2014 à 10:43, Jorge Ferrando <jorfe...@gmail.com> a écrit : This is what Marvel shows for old GC in the last 6 hours for that node: <image.png> > On Thu, May 29, 2014 at 10:39 AM, David Pilato <da...@pilato.fr> wrote: > It sounds like the old GC is not able to clean old gen space enough. > I guess that if you look at your Marvel dashboards, you can see that on old > GC. > > So memory pressure is the first guess. You may have too many old GC cycles. > > > -- > David ;-) > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > > > Le 29 mai 2014 à 10:32, Jorge Ferrando <jorfe...@gmail.com> a écrit : > > Thanks for the answer David > > I added this setting to elasticsearch.yml some days ago to see if that what's > the problem: > > discovery.zen.ping.timeout: 5s > discovery.zen.fd.ping_interval: 5s > discovery.zen.fd.ping_timeout: 60s > discovery.zen.fd.ping_retries: 3 > > If I'm not mistaken, with those settings the node should be marked as > unavailable after 3m and most of the times it happens quicker. Am I wrong? > > >> On Thu, May 29, 2014 at 10:29 AM, David Pilato <da...@pilato.fr> wrote: >> GC took too much time so your node become unresponsive I think. >> If you set 30 Gb RAM, you should increase the time out ping setting before a >> node is marked as unresponsive. >> >> And if you are under memory pressure, you could try to check your requests >> and see if you can have some optimization or start new nodes... >> >> My 2 cents. >> >> -- >> David ;-) >> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs >> >> >> Le 29 mai 2014 à 09:56, Jorge Ferrando <jorfe...@gmail.com> a écrit : >> >> I've been analyzing the problem with Marvel and nagios and I managed to get >> 2 more details: >> >> - The node restarting/reinitializing it's always the same. Node 3 >> - It always happens quickly after getting the cluster in green state. >> Between some seconds and 2-3 minutes >> >> I have debug mode on in logging.yml: >> >> logger: >> # log action execution errors for easier debugging >> action: DEBUG >> >> But i dont see anything in the log. For instance, this is the last time it >> happened at around 9:47 the cluster became green and 9:50 the node restarted >> >> [2014-05-29 09:30:57,235][INFO ][monitor.jvm ] [elastic ASIC >> nodo 3] [gc][young][129][20] duration [745ms], collections [1]/[1s], total >> [745ms]/[8.5s], memory [951.1mb]->[598.9mb]/[29.9gb], all_pools {[young] >> [421.5mb]->[8.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] >> [463.1mb]->[524.1mb]/[29.3gb]} >> [2014-05-29 09:45:36,322][WARN ][monitor.jvm ] [elastic ASIC >> nodo 3] [gc][old][964][1] duration [29.5s], collections [1]/[30.4s], total >> [29.5s]/[29.5s], memory [5.1gb]->[4.3gb]/[29.9gb], all_pools {[young] >> [29.4mb]->[34.9mb]/[532.5mb]}{[survivor] [59.9mb]->[0b]/[66.5mb]}{[old] >> [5gb]->[4.2gb]/[29.3gb]} >> [2014-05-29 09:50:41,040][INFO ][node ] [elastic ASIC >> nodo 3] version[1.2.0], pid[7021], build[c82387f/2014-05-22T12:49:13Z] >> [2014-05-29 09:50:41,041][INFO ][node ] [elastic ASIC >> nodo 3] initializing ... >> [2014-05-29 09:50:41,063][INFO ][plugins ] [elastic ASIC >> nodo 3] loaded [marvel], sites [marvel, paramedic, inquisitor, HQ, bigdesk, >> head] >> [2014-05-29 09:50:47,908][INFO ][node ] [elastic ASIC >> nodo 3] initialized >> [2014-05-29 09:50:47,909][INFO ][node ] [elastic ASIC >> nodo 3] starting ... >> >> ¿Is there any other way of debugging what's going on with that node? >> >> >> >> >>> On Tue, May 27, 2014 at 12:49 PM, Jorge Ferrando <jorfe...@gmail.com> wrote: >>> I thought about that but It would be strange because they are 3 Virtual >>> Machines in the same VMWare cluster with other hundreds of services and >>> nobody reported any networking problem. >>> >>> >>>> On Thu, May 22, 2014 at 3:16 PM, emeschitc <emesch...@gmail.com> wrote: >>>> Hi, >>>> >>>> I may be wrong but it seems to me you have a problem with your network. It >>>> may be a flaky connection, broken nic or something wrong with your >>>> configuration for discovery and/or data transport ? >>>> >>>> Caused by: org.elasticsearch.transport.NodeNotConnectedException: [elastic >>>> ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected >>>> at >>>> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859) >>>> at >>>> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540) >>>> at >>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189) >>>> >>>> Check the status of the network on this node. >>>> >>>> >>>> >>>>> On Thu, May 22, 2014 at 2:07 PM, Jorge Ferrando [via ElasticSearch Users] >>>>> <[hidden email]> wrote: >>>>> Hello >>>>> >>>>> We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and >>>>> elasticsearch v1.1.1 >>>>> >>>>> It's be running flawlessly but since the last weak some of the nodes >>>>> restarts randomly and cluster gets to red state, then yellow, then green >>>>> and it happens again in a loop (sometimes it even doesnt get green state) >>>>> >>>>> I've tried to look at the logs but i can't find and obvious reason of >>>>> what can be going on >>>>> >>>>> I've found entries like these, but I don't know if they are in some way >>>>> related to the crash: >>>>> >>>>> [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC >>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] >>>>> returning default postings format >>>>> [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC >>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: >>>>> [date_end.raw] returning default postings format >>>>> [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC >>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: >>>>> [date_start] returning default postings format >>>>> [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC >>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: >>>>> [date_start.raw] returning default postings format >>>>> >>>>> >>>>> For instance right now it was in yellow state, really close to get to the >>>>> green state and suddenly node 3 autorestarted and now cluster is red with >>>>> 2000 shard initializing. The log in that node shows this: >>>>> >>>>> [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC >>>>> nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], >>>>> total [735ms]/[1.1m], memory [6.5gb]->[6.1gb]/[19.9gb], all_pools >>>>> {[young] [456mb]->[7.2mb]/[532.5mb]}{[survivor] >>>>> [66.5mb]->[66.5mb]/[66.5mb]}{[old] [6gb]->[6gb]/[19.3gb]} >>>>> [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC >>>>> nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] >>>>> [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC >>>>> nodo 3] initializing ... >>>>> [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC >>>>> nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] >>>>> [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC >>>>> nodo 3] initialized >>>>> [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC >>>>> nodo 3] starting ... >>>>> >>>>> The crash happened exactly at 14:02. >>>>> >>>>> Any Idea what can be going on or how can I trace what's happening? >>>>> >>>>> After rebooting there are also DEBUG errors like this: >>>>> >>>>> [2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic ASIC >>>>> nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], >>>>> s[STARTED]: Failed to execute >>>>> [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true] >>>>> org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC >>>>> nodo 2][inet[/158.42.250.79:9301]][search/phase/query] >>>>> at >>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) >>>>> at >>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) >>>>> at >>>>> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) >>>>> at >>>>> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) >>>>> at >>>>> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) >>>>> at >>>>> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43) >>>>> at >>>>> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) >>>>> at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92) >>>>> at >>>>> org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212) >>>>> at >>>>> org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98) >>>>> at >>>>> org.elasticsearch.rest.RestController.executeHandler(RestController.java:159) >>>>> at >>>>> org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142) >>>>> at >>>>> org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121) >>>>> at >>>>> org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83) >>>>> at >>>>> org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291) >>>>> at >>>>> org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43) >>>>> at >>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) >>>>> at >>>>> org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145) >>>>> at >>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) >>>>> at >>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) >>>>> at >>>>> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) >>>>> at >>>>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) >>>>> at >>>>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) >>>>> at >>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) >>>>> at >>>>> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) >>>>> at >>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268) >>>>> at >>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255) >>>>> at >>>>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) >>>>> at >>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) >>>>> at >>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) >>>>> at >>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) >>>>> at >>>>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) >>>>> at >>>>> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) >>>>> at >>>>> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:744) >>>>> Caused by: org.elasticsearch.transport.NodeNotConnectedException: >>>>> [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected >>>>> at >>>>> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859) >>>>> at >>>>> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540) >>>>> at >>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189) >>>>> ... 50 more >>>>> >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [hidden email]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elasticsearch/fa53a41d-064b-4250-8003-31cf845b7216%40googlegroups.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>>> If you reply to this email, your message will be added to the discussion >>>>> below: >>>>> http://elasticsearch-users.115913.n3.nabble.com/Nodes-restarting-automatically-tp4056276.html >>>>> To unsubscribe from ElasticSearch Users, click here. >>>>> NAML >>>> >>>> >>>> View this message in context: Re: Nodes restarting automatically >>>> Sent from the ElasticSearch Users mailing list archive at Nabble.com. >>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "elasticsearch" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> elasticsearch+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/CAE6dBgjyXAM8ELYJ8AKAx6f5pSxri%3DNk1Oq%3Dx%3D5MCp5qYSzuug%40mail.gmail.com. >>>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5ArT-7tCh_f%2B9XAH5UfnsjWaBrMG0sacqUrL7T6JV9r7Q%40mail.gmail.com. >> For more options, visit https://groups.google.com/d/optout. >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "elasticsearch" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/AA94DDC8-AC14-47E2-80D5-6B670FF8D9E7%40pilato.fr. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5CqL5ss7MbtO0L481XXkycTdz2qFSH%3DnPvu7P_W_3CiKg%40mail.gmail.com. > > For more options, visit https://groups.google.com/d/optout. > -- > You received this message because you are subscribed to a topic in the Google > Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/F40FD3BA-135B-49B9-B2CF-0E68D58D9B5D%40pilato.fr. > > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5BzJJ3Hy0CJeJ_zXBSFt7iGRPav%2BSXN8KJ1-ixFNPviUg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/332CAAEE-2BB9-46F9-A0E3-94D4AD30B21D%40pilato.fr. For more options, visit https://groups.google.com/d/optout.