Re: Nodes restarting automatically
I've been analyzing the problem with Marvel and nagios and I managed to get 2 more details: - The node restarting/reinitializing it's always the same. Node 3 - It always happens quickly after getting the cluster in green state. Between some seconds and 2-3 minutes I have debug mode on in logging.yml: logger: # log action execution errors for easier debugging action: DEBUG But i dont see anything in the log. For instance, this is the last time it happened at around 9:47 the cluster became green and 9:50 the node restarted [2014-05-29 09:30:57,235][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][129][20] duration [745ms], collections [1]/[1s], total [745ms]/[8.5s], memory [951.1mb]-[598.9mb]/[29.9gb], all_pools {[young] [421.5mb]-[8.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [463.1mb]-[524.1mb]/[29.3gb]} [2014-05-29 09:45:36,322][WARN ][monitor.jvm ] [elastic ASIC nodo 3] [gc][old][964][1] duration [29.5s], collections [1]/[30.4s], total [29.5s]/[29.5s], memory [5.1gb]-[4.3gb]/[29.9gb], all_pools {[young] [29.4mb]-[34.9mb]/[532.5mb]}{[survivor] [59.9mb]-[0b]/[66.5mb]}{[old] [5gb]-[4.2gb]/[29.3gb]} [2014-05-29 09:50:41,040][INFO ][node ] [elastic ASIC nodo 3] version[1.2.0], pid[7021], build[c82387f/2014-05-22T12:49:13Z] [2014-05-29 09:50:41,041][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-29 09:50:41,063][INFO ][plugins ] [elastic ASIC nodo 3] loaded [marvel], sites [marvel, paramedic, inquisitor, HQ, bigdesk, head] [2014-05-29 09:50:47,908][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-29 09:50:47,909][INFO ][node ] [elastic ASIC nodo 3] starting ... ¿Is there any other way of debugging what's going on with that node? On Tue, May 27, 2014 at 12:49 PM, Jorge Ferrando jorfe...@gmail.com wrote: I thought about that but It would be strange because they are 3 Virtual Machines in the same VMWare cluster with other hundreds of services and nobody reported any networking problem. On Thu, May 22, 2014 at 3:16 PM, emeschitc emesch...@gmail.com wrote: Hi, I may be wrong but it seems to me you have a problem with your network. It may be a flaky connection, broken nic or something wrong with your configuration for discovery and/or data transport ? Caused by: org.elasticsearch.transport.NodeNotConnectedException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859) at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189) Check the status of the network on this node. On Thu, May 22, 2014 at 2:07 PM, Jorge Ferrando [via ElasticSearch Users] [hidden email] http://user/SendEmail.jtp?type=nodenode=4056287i=0 wrote: Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this: [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [6gb]-[6gb]/[19.3gb]} [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] [2014-05-22 14:03:44,826][INFO ][node ] [elastic
Re: Nodes restarting automatically
Thanks for the answer David I added this setting to elasticsearch.yml some days ago to see if that what's the problem: discovery.zen.ping.timeout: 5s discovery.zen.fd.ping_interval: 5s discovery.zen.fd.ping_timeout: 60s discovery.zen.fd.ping_retries: 3 If I'm not mistaken, with those settings the node should be marked as unavailable after 3m and most of the times it happens quicker. Am I wrong? On Thu, May 29, 2014 at 10:29 AM, David Pilato da...@pilato.fr wrote: GC took too much time so your node become unresponsive I think. If you set 30 Gb RAM, you should increase the time out ping setting before a node is marked as unresponsive. And if you are under memory pressure, you could try to check your requests and see if you can have some optimization or start new nodes... My 2 cents. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 29 mai 2014 à 09:56, Jorge Ferrando jorfe...@gmail.com a écrit : I've been analyzing the problem with Marvel and nagios and I managed to get 2 more details: - The node restarting/reinitializing it's always the same. Node 3 - It always happens quickly after getting the cluster in green state. Between some seconds and 2-3 minutes I have debug mode on in logging.yml: logger: # log action execution errors for easier debugging action: DEBUG But i dont see anything in the log. For instance, this is the last time it happened at around 9:47 the cluster became green and 9:50 the node restarted [2014-05-29 09:30:57,235][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][129][20] duration [745ms], collections [1]/[1s], total [745ms]/[8.5s], memory [951.1mb]-[598.9mb]/[29.9gb], all_pools {[young] [421.5mb]-[8.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [463.1mb]-[524.1mb]/[29.3gb]} [2014-05-29 09:45:36,322][WARN ][monitor.jvm ] [elastic ASIC nodo 3] [gc][old][964][1] duration [29.5s], collections [1]/[30.4s], total [29.5s]/[29.5s], memory [5.1gb]-[4.3gb]/[29.9gb], all_pools {[young] [29.4mb]-[34.9mb]/[532.5mb]}{[survivor] [59.9mb]-[0b]/[66.5mb]}{[old] [5gb]-[4.2gb]/[29.3gb]} [2014-05-29 09:50:41,040][INFO ][node ] [elastic ASIC nodo 3] version[1.2.0], pid[7021], build[c82387f/2014-05-22T12:49:13Z] [2014-05-29 09:50:41,041][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-29 09:50:41,063][INFO ][plugins ] [elastic ASIC nodo 3] loaded [marvel], sites [marvel, paramedic, inquisitor, HQ, bigdesk, head] [2014-05-29 09:50:47,908][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-29 09:50:47,909][INFO ][node ] [elastic ASIC nodo 3] starting ... ¿Is there any other way of debugging what's going on with that node? On Tue, May 27, 2014 at 12:49 PM, Jorge Ferrando jorfe...@gmail.com wrote: I thought about that but It would be strange because they are 3 Virtual Machines in the same VMWare cluster with other hundreds of services and nobody reported any networking problem. On Thu, May 22, 2014 at 3:16 PM, emeschitc emesch...@gmail.com wrote: Hi, I may be wrong but it seems to me you have a problem with your network. It may be a flaky connection, broken nic or something wrong with your configuration for discovery and/or data transport ? Caused by: org.elasticsearch.transport.NodeNotConnectedException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859) at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189) Check the status of the network on this node. On Thu, May 22, 2014 at 2:07 PM, Jorge Ferrando [via ElasticSearch Users] [hidden email] http://user/SendEmail.jtp?type=nodenode=4056287i=0 wrote: Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field
Re: Nodes restarting automatically
Different message in the log aftere another crash: [2014-05-23 14:17:11,580][WARN ][transport.netty ] [elastic ASIC nodo 3] exception caught on transport layer [[id: 0xc5d07c82, / 158.42.250.192:59864 : /158.42.250.79:9301]], closing connection java.io.IOException: Conexión reinicializada por la máquina remota at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) On Thu, May 22, 2014 at 2:34 PM, Jorge Ferrando jorfe...@gmail.com wrote: I've been checking syslog in all of the nodes and I found no mention to oom, process killed, out of memory or something similar... Just in caes I ran this commands in the 3 nodes and the problem persists: echo 0 /proc/sys/vm/oom-kill echo 1 /proc/sys/vm/overcommit_memory echo 100 /proc/sys/vm/overcommit_ratio On Thu, May 22, 2014 at 2:16 PM, Nikolas Everett nik9...@gmail.comwrote: Like Mark said, check the oomkiller. It should log to syslog. Its is evil. Nik On Thu, May 22, 2014 at 2:14 PM, Jorge Ferrando jorfe...@gmail.comwrote: elasticsearch nodes are launched through /etc/init.d/elasticsearch On Thu, May 22, 2014 at 2:13 PM, Mark Walkom ma...@campaignmonitor.comwrote: How are you running the service, upstart, init or something else? ES shouldn't just restart on it's own, this could be something else like the kernel's OOM killer. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 22:07, Jorge Ferrando jorfe...@gmail.com wrote: Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this: [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [6gb]-[6gb]/[19.3gb]} [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-22 14:03:51,967][INFO ][node
Nodes restarting automatically
Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this: [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [6gb]-[6gb]/[19.3gb]} [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] starting ... The crash happened exactly at 14:02. Any Idea what can be going on or how can I trace what's happening? After rebooting there are also DEBUG errors like this: [2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true] org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query] at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92) at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212) at org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98) at org.elasticsearch.rest.RestController.executeHandler(RestController.java:159) at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142) at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121) at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83) at
Re: Nodes restarting automatically
elasticsearch nodes are launched through /etc/init.d/elasticsearch On Thu, May 22, 2014 at 2:13 PM, Mark Walkom ma...@campaignmonitor.comwrote: How are you running the service, upstart, init or something else? ES shouldn't just restart on it's own, this could be something else like the kernel's OOM killer. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 22:07, Jorge Ferrando jorfe...@gmail.com wrote: Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this: [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [6gb]-[6gb]/[19.3gb]} [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] starting ... The crash happened exactly at 14:02. Any Idea what can be going on or how can I trace what's happening? After rebooting there are also DEBUG errors like this: [2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true] org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query] at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.client.node.NodeClient.execute
Re: Nodes restarting automatically
I've been checking syslog in all of the nodes and I found no mention to oom, process killed, out of memory or something similar... Just in caes I ran this commands in the 3 nodes and the problem persists: echo 0 /proc/sys/vm/oom-kill echo 1 /proc/sys/vm/overcommit_memory echo 100 /proc/sys/vm/overcommit_ratio On Thu, May 22, 2014 at 2:16 PM, Nikolas Everett nik9...@gmail.com wrote: Like Mark said, check the oomkiller. It should log to syslog. Its is evil. Nik On Thu, May 22, 2014 at 2:14 PM, Jorge Ferrando jorfe...@gmail.comwrote: elasticsearch nodes are launched through /etc/init.d/elasticsearch On Thu, May 22, 2014 at 2:13 PM, Mark Walkom ma...@campaignmonitor.comwrote: How are you running the service, upstart, init or something else? ES shouldn't just restart on it's own, this could be something else like the kernel's OOM killer. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 22:07, Jorge Ferrando jorfe...@gmail.com wrote: Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this: [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [6gb]-[6gb]/[19.3gb]} [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] starting ... The crash happened exactly at 14:02. Any Idea what can be going on or how can I trace what's happening? After rebooting there are also DEBUG errors like this: [2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true] org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query] at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59