Different message in the log aftere another crash: [2014-05-23 14:17:11,580][WARN ][transport.netty ] [elastic ASIC nodo 3] exception caught on transport layer [[id: 0xc5d07c82, / 158.42.250.192:59864 :> /158.42.250.79:9301]], closing connection java.io.IOException: Conexión reinicializada por la máquina remota at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
On Thu, May 22, 2014 at 2:34 PM, Jorge Ferrando <jorfe...@gmail.com> wrote: > I've been checking syslog in all of the nodes and I found no mention to > oom, process killed, out of memory or something similar... > > Just in caes I ran this commands in the 3 nodes and the problem persists: > > echo "0" > /proc/sys/vm/oom-kill > echo 1 > /proc/sys/vm/overcommit_memory > echo 100 > /proc/sys/vm/overcommit_ratio > > > On Thu, May 22, 2014 at 2:16 PM, Nikolas Everett <nik9...@gmail.com>wrote: > >> Like Mark said, check the oomkiller. It should log to syslog. Its is >> evil. >> >> Nik >> >> >> On Thu, May 22, 2014 at 2:14 PM, Jorge Ferrando <jorfe...@gmail.com>wrote: >> >>> elasticsearch nodes are launched through /etc/init.d/elasticsearch >>> >>> >>> On Thu, May 22, 2014 at 2:13 PM, Mark Walkom >>> <ma...@campaignmonitor.com>wrote: >>> >>>> How are you running the service, upstart, init or something else? >>>> >>>> ES shouldn't just restart on it's own, this could be something else >>>> like the kernel's OOM killer. >>>> >>>> Regards, >>>> Mark Walkom >>>> >>>> Infrastructure Engineer >>>> Campaign Monitor >>>> email: ma...@campaignmonitor.com >>>> web: www.campaignmonitor.com >>>> >>>> >>>> On 22 May 2014 22:07, Jorge Ferrando <jorfe...@gmail.com> wrote: >>>> >>>>> Hello >>>>> >>>>> We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and >>>>> elasticsearch v1.1.1 >>>>> >>>>> It's be running flawlessly but since the last weak some of the nodes >>>>> restarts randomly and cluster gets to red state, then yellow, then green >>>>> and it happens again in a loop (sometimes it even doesnt get green state) >>>>> >>>>> I've tried to look at the logs but i can't find and obvious reason of >>>>> what can be going on >>>>> >>>>> I've found entries like these, but I don't know if they are in some >>>>> way related to the crash: >>>>> >>>>> [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic >>>>> ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: >>>>> [date_end] returning default postings format >>>>> [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic >>>>> ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: >>>>> [date_end.raw] returning default postings format >>>>> [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic >>>>> ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: >>>>> [date_start] returning default postings format >>>>> [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic >>>>> ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: >>>>> [date_start.raw] returning default postings format >>>>> >>>>> >>>>> For instance right now it was in yellow state, really close to get to >>>>> the green state and suddenly node 3 autorestarted and now cluster is red >>>>> with 2000 shard initializing. The log in that node shows this: >>>>> >>>>> [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic >>>>> ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections >>>>> [1]/[1s], >>>>> total [735ms]/[1.1m], memory [6.5gb]->[6.1gb]/[19.9gb], all_pools {[young] >>>>> [456mb]->[7.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] >>>>> [6gb]->[6gb]/[19.3gb]} >>>>> [2014-05-22 14:03:44,825][INFO ][node ] [elastic >>>>> ASIC nodo 3] version[1.1.1], pid[7511], >>>>> build[f1585f0/2014-04-16T14:27:12Z] >>>>> [2014-05-22 14:03:44,826][INFO ][node ] [elastic >>>>> ASIC nodo 3] initializing ... >>>>> [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic >>>>> ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] >>>>> [2014-05-22 14:03:51,967][INFO ][node ] [elastic >>>>> ASIC nodo 3] initialized >>>>> [2014-05-22 14:03:51,967][INFO ][node ] [elastic >>>>> ASIC nodo 3] starting ... >>>>> >>>>> The crash happened exactly at 14:02. >>>>> >>>>> Any Idea what can be going on or how can I trace what's happening? >>>>> >>>>> After rebooting there are also DEBUG errors like this: >>>>> >>>>> [2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic >>>>> ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], >>>>> s[STARTED]: Failed to execute >>>>> [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard >>>>> [true] >>>>> org.elasticsearch.transport.SendRequestTransportException: [elastic >>>>> ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query] >>>>> at >>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) >>>>> at >>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) >>>>> at >>>>> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) >>>>> at >>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) >>>>> at >>>>> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) >>>>> at >>>>> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) >>>>> at >>>>> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43) >>>>> at >>>>> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) >>>>> at >>>>> org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92) >>>>> at >>>>> org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212) >>>>> at >>>>> org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98) >>>>> at >>>>> org.elasticsearch.rest.RestController.executeHandler(RestController.java:159) >>>>> at >>>>> org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142) >>>>> at >>>>> org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121) >>>>> at >>>>> org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83) >>>>> at >>>>> org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291) >>>>> at >>>>> org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43) >>>>> at >>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) >>>>> at >>>>> org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145) >>>>> at >>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) >>>>> at >>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) >>>>> at >>>>> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) >>>>> at >>>>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) >>>>> at >>>>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) >>>>> at >>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) >>>>> at >>>>> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>>>> at >>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) >>>>> at >>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268) >>>>> at >>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255) >>>>> at >>>>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) >>>>> at >>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) >>>>> at >>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) >>>>> at >>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) >>>>> at >>>>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) >>>>> at >>>>> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) >>>>> at >>>>> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:744) >>>>> Caused by: org.elasticsearch.transport.NodeNotConnectedException: >>>>> [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected >>>>> at >>>>> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859) >>>>> at >>>>> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540) >>>>> at >>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189) >>>>> ... 50 more >>>>> >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to elasticsearch+unsubscr...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elasticsearch/fa53a41d-064b-4250-8003-31cf845b7216%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/fa53a41d-064b-4250-8003-31cf845b7216%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "elasticsearch" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe >>>> . >>>> To unsubscribe from this group and all its topics, send an email to >>>> elasticsearch+unsubscr...@googlegroups.com. >>>> >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/CAEM624aL0xXsEF4qbtYH82%3DgmhpQJZYFn3xk_R5ryiZOeZCF_Q%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAEM624aL0xXsEF4qbtYH82%3DgmhpQJZYFn3xk_R5ryiZOeZCF_Q%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearch+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5D5qPB%2BqcM9QM1Leiw8WJv27vhPb4emirQy3uYrqWsRvA%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5D5qPB%2BqcM9QM1Leiw8WJv27vhPb4emirQy3uYrqWsRvA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "elasticsearch" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0f6kHa%2BFPofN%2BGwkNzhEsPT7HwOnW-95PJhaor9NprhA%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0f6kHa%2BFPofN%2BGwkNzhEsPT7HwOnW-95PJhaor9NprhA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5AiorKx1rtf5OznszjwVnvp3Q1RkCLnuAy%3D9Mm14ctJXw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.