Re: Nodes restarting automatically

2014-05-29 Thread Jorge Ferrando
I've been analyzing the problem with Marvel and nagios and I managed to get
2 more details:

- The node restarting/reinitializing it's always the same. Node 3
- It always happens quickly after getting the cluster in green state.
Between some seconds and 2-3 minutes

I have debug mode on in logging.yml:

logger:
  # log action execution errors for easier debugging
  action: DEBUG

But i dont see anything in the log. For instance, this is the last time it
happened at around 9:47 the cluster became green and 9:50 the node restarted

[2014-05-29 09:30:57,235][INFO ][monitor.jvm  ] [elastic ASIC
nodo 3] [gc][young][129][20] duration [745ms], collections [1]/[1s], total
[745ms]/[8.5s], memory [951.1mb]-[598.9mb]/[29.9gb], all_pools {[young]
[421.5mb]-[8.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old]
[463.1mb]-[524.1mb]/[29.3gb]}
[2014-05-29 09:45:36,322][WARN ][monitor.jvm  ] [elastic ASIC
nodo 3] [gc][old][964][1] duration [29.5s], collections [1]/[30.4s], total
[29.5s]/[29.5s], memory [5.1gb]-[4.3gb]/[29.9gb], all_pools {[young]
[29.4mb]-[34.9mb]/[532.5mb]}{[survivor] [59.9mb]-[0b]/[66.5mb]}{[old]
[5gb]-[4.2gb]/[29.3gb]}
[2014-05-29 09:50:41,040][INFO ][node ] [elastic ASIC
nodo 3] version[1.2.0], pid[7021], build[c82387f/2014-05-22T12:49:13Z]
[2014-05-29 09:50:41,041][INFO ][node ] [elastic ASIC
nodo 3] initializing ...
[2014-05-29 09:50:41,063][INFO ][plugins  ] [elastic ASIC
nodo 3] loaded [marvel], sites [marvel, paramedic, inquisitor, HQ, bigdesk,
head]
[2014-05-29 09:50:47,908][INFO ][node ] [elastic ASIC
nodo 3] initialized
[2014-05-29 09:50:47,909][INFO ][node ] [elastic ASIC
nodo 3] starting ...

¿Is there any other way of debugging what's going on with that node?




On Tue, May 27, 2014 at 12:49 PM, Jorge Ferrando jorfe...@gmail.com wrote:

 I thought about that but It would be strange because they are 3 Virtual
 Machines in the same VMWare cluster with other hundreds of services and
 nobody reported any networking problem.


 On Thu, May 22, 2014 at 3:16 PM, emeschitc emesch...@gmail.com wrote:

 Hi,

 I may be wrong but it seems to me you have a problem with your network.
 It may be a flaky connection, broken nic or something wrong with your
 configuration for discovery and/or data transport ?

 Caused by: org.elasticsearch.transport.NodeNotConnectedException:
 [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected
  at
 org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
 at
 org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
  at
 org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)

 Check the status of the network on this node.



 On Thu, May 22, 2014 at 2:07 PM, Jorge Ferrando [via ElasticSearch Users]
 [hidden email] http://user/SendEmail.jtp?type=nodenode=4056287i=0
 wrote:

 Hello

 We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and
 elasticsearch v1.1.1

 It's be running flawlessly but since the last weak some of the nodes
 restarts randomly and cluster gets to red state, then yellow, then green
 and it happens again in a loop (sometimes it even doesnt get green state)

 I've tried to look at the logs but i can't find and obvious reason of
 what can be going on

 I've found entries like these, but I don't know if they are in some way
 related to the crash:

 [2014-05-22 13:55:16,150][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_end] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_end.raw] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_start] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_start.raw] returning default postings format


 For instance right now it was in yellow state, really close to get to
 the green state and suddenly node 3 autorestarted and now cluster is red
 with 2000 shard initializing. The log in that node shows this:

 [2014-05-22 13:59:48,498][INFO ][monitor.jvm  ] [elastic
 ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s],
 total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young]
 [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old]
 [6gb]-[6gb]/[19.3gb]}
 [2014-05-22 14:03:44,825][INFO ][node ] [elastic
 ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z]
 [2014-05-22 14:03:44,826][INFO ][node ] [elastic

Re: Nodes restarting automatically

2014-05-29 Thread Jorge Ferrando
Thanks for the answer David

I added this setting to elasticsearch.yml some days ago to see if that
what's the problem:

discovery.zen.ping.timeout: 5s
discovery.zen.fd.ping_interval: 5s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 3

If I'm not mistaken, with those settings the node should be marked as
unavailable after 3m and most of the times it happens quicker. Am I wrong?


On Thu, May 29, 2014 at 10:29 AM, David Pilato da...@pilato.fr wrote:

 GC took too much time so your node become unresponsive I think.
 If you set 30 Gb RAM, you should increase the time out ping setting before
 a node is marked as unresponsive.

 And if you are under memory pressure, you could try to check your requests
 and see if you can have some optimization or start new nodes...

 My 2 cents.

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 29 mai 2014 à 09:56, Jorge Ferrando jorfe...@gmail.com a écrit :

 I've been analyzing the problem with Marvel and nagios and I managed to
 get 2 more details:

 - The node restarting/reinitializing it's always the same. Node 3
 - It always happens quickly after getting the cluster in green state.
 Between some seconds and 2-3 minutes

 I have debug mode on in logging.yml:

 logger:
   # log action execution errors for easier debugging
   action: DEBUG

 But i dont see anything in the log. For instance, this is the last time it
 happened at around 9:47 the cluster became green and 9:50 the node restarted

 [2014-05-29 09:30:57,235][INFO ][monitor.jvm  ] [elastic ASIC
 nodo 3] [gc][young][129][20] duration [745ms], collections [1]/[1s], total
 [745ms]/[8.5s], memory [951.1mb]-[598.9mb]/[29.9gb], all_pools {[young]
 [421.5mb]-[8.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old]
 [463.1mb]-[524.1mb]/[29.3gb]}
 [2014-05-29 09:45:36,322][WARN ][monitor.jvm  ] [elastic ASIC
 nodo 3] [gc][old][964][1] duration [29.5s], collections [1]/[30.4s], total
 [29.5s]/[29.5s], memory [5.1gb]-[4.3gb]/[29.9gb], all_pools {[young]
 [29.4mb]-[34.9mb]/[532.5mb]}{[survivor] [59.9mb]-[0b]/[66.5mb]}{[old]
 [5gb]-[4.2gb]/[29.3gb]}
 [2014-05-29 09:50:41,040][INFO ][node ] [elastic ASIC
 nodo 3] version[1.2.0], pid[7021], build[c82387f/2014-05-22T12:49:13Z]
 [2014-05-29 09:50:41,041][INFO ][node ] [elastic ASIC
 nodo 3] initializing ...
 [2014-05-29 09:50:41,063][INFO ][plugins  ] [elastic ASIC
 nodo 3] loaded [marvel], sites [marvel, paramedic, inquisitor, HQ, bigdesk,
 head]
 [2014-05-29 09:50:47,908][INFO ][node ] [elastic ASIC
 nodo 3] initialized
 [2014-05-29 09:50:47,909][INFO ][node ] [elastic ASIC
 nodo 3] starting ...

 ¿Is there any other way of debugging what's going on with that node?




 On Tue, May 27, 2014 at 12:49 PM, Jorge Ferrando jorfe...@gmail.com
 wrote:

 I thought about that but It would be strange because they are 3 Virtual
 Machines in the same VMWare cluster with other hundreds of services and
 nobody reported any networking problem.


 On Thu, May 22, 2014 at 3:16 PM, emeschitc emesch...@gmail.com wrote:

 Hi,

 I may be wrong but it seems to me you have a problem with your network.
 It may be a flaky connection, broken nic or something wrong with your
 configuration for discovery and/or data transport ?

 Caused by: org.elasticsearch.transport.NodeNotConnectedException:
 [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected
  at
 org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
 at
 org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
  at
 org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)

 Check the status of the network on this node.



 On Thu, May 22, 2014 at 2:07 PM, Jorge Ferrando [via ElasticSearch
 Users] [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4056287i=0 wrote:

 Hello

 We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and
 elasticsearch v1.1.1

 It's be running flawlessly but since the last weak some of the nodes
 restarts randomly and cluster gets to red state, then yellow, then green
 and it happens again in a loop (sometimes it even doesnt get green state)

 I've tried to look at the logs but i can't find and obvious reason of
 what can be going on

 I've found entries like these, but I don't know if they are in some way
 related to the crash:

 [2014-05-22 13:55:16,150][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_end] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_end.raw] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field

Re: Nodes restarting automatically

2014-05-23 Thread Jorge Ferrando
Different message in the log aftere another crash:

[2014-05-23 14:17:11,580][WARN ][transport.netty  ] [elastic ASIC
nodo 3] exception caught on transport layer [[id: 0xc5d07c82, /
158.42.250.192:59864 : /158.42.250.79:9301]], closing connection
java.io.IOException: Conexión reinicializada por la máquina remota
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
 at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)


On Thu, May 22, 2014 at 2:34 PM, Jorge Ferrando jorfe...@gmail.com wrote:

 I've been checking syslog in all of the nodes and I found no mention to
 oom, process killed, out of memory or something similar...

 Just in caes I ran this commands in the 3 nodes and the problem persists:

 echo 0  /proc/sys/vm/oom-kill
 echo 1  /proc/sys/vm/overcommit_memory
 echo 100  /proc/sys/vm/overcommit_ratio


 On Thu, May 22, 2014 at 2:16 PM, Nikolas Everett nik9...@gmail.comwrote:

 Like Mark said, check the oomkiller.  It should log to syslog.  Its is
 evil.

 Nik


 On Thu, May 22, 2014 at 2:14 PM, Jorge Ferrando jorfe...@gmail.comwrote:

 elasticsearch nodes are launched through /etc/init.d/elasticsearch


 On Thu, May 22, 2014 at 2:13 PM, Mark Walkom 
 ma...@campaignmonitor.comwrote:

 How are you running the service, upstart, init or something else?

 ES shouldn't just restart on it's own, this could be something else
 like the kernel's OOM killer.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 22 May 2014 22:07, Jorge Ferrando jorfe...@gmail.com wrote:

 Hello

 We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and
 elasticsearch v1.1.1

 It's be running flawlessly but since the last weak some of the nodes
 restarts randomly and cluster gets to red state, then yellow, then green
 and it happens again in a loop (sometimes it even doesnt get green state)

 I've tried to look at the logs but i can't find and obvious reason of
 what can be going on

 I've found entries like these, but I don't know if they are in some
 way related to the crash:

 [2014-05-22 13:55:16,150][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_end] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_end.raw] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_start] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_start.raw] returning default postings format


 For instance right now it was in yellow state, really close to get to
 the green state and suddenly node 3 autorestarted and now cluster is red
 with 2000 shard initializing. The log in that node shows this:

 [2014-05-22 13:59:48,498][INFO ][monitor.jvm  ] [elastic
 ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections 
 [1]/[1s],
 total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young]
 [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old]
 [6gb]-[6gb]/[19.3gb]}
 [2014-05-22 14:03:44,825][INFO ][node ] [elastic
 ASIC nodo 3] version[1.1.1], pid[7511], 
 build[f1585f0/2014-04-16T14:27:12Z]
 [2014-05-22 14:03:44,826][INFO ][node ] [elastic
 ASIC nodo 3] initializing ...
 [2014-05-22 14:03:44,839][INFO ][plugins  ] [elastic
 ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head]
 [2014-05-22 14:03:51,967][INFO ][node ] [elastic
 ASIC nodo 3] initialized
 [2014-05-22 14:03:51,967][INFO ][node

Nodes restarting automatically

2014-05-22 Thread Jorge Ferrando
Hello 

We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and 
elasticsearch v1.1.1

It's be running flawlessly but since the last weak some of the nodes 
restarts randomly and cluster gets to red state, then yellow, then green 
and it happens again in a loop (sometimes it even doesnt get green state)

I've tried to look at the logs but i can't find and obvious reason of what 
can be going on 

I've found entries like these, but I don't know if they are in some way 
related to the crash:

[2014-05-22 13:55:16,150][WARN ][index.codec  ] [elastic ASIC 
nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] 
returning default postings format
[2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic ASIC 
nodo 3] [logstash-2014.05.22] no index mapper found for field: 
[date_end.raw] returning default postings format
[2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic ASIC 
nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] 
returning default postings format
[2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic ASIC 
nodo 3] [logstash-2014.05.22] no index mapper found for field: 
[date_start.raw] returning default postings format


For instance right now it was in yellow state, really close to get to the 
green state and suddenly node 3 autorestarted and now cluster is red with 
2000 shard initializing. The log in that node shows this:

[2014-05-22 13:59:48,498][INFO ][monitor.jvm  ] [elastic ASIC 
nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], 
total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] 
[456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] 
[6gb]-[6gb]/[19.3gb]}
[2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC 
nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z]
[2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC 
nodo 3] initializing ...
[2014-05-22 14:03:44,839][INFO ][plugins  ] [elastic ASIC 
nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head]
[2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC 
nodo 3] initialized
[2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC 
nodo 3] starting ...

The crash happened exactly at 14:02.

Any Idea what can be going on or how can I trace what's happening?

After rebooting there are also DEBUG errors like this:

[2014-05-22 14:06:16,621][DEBUG][action.search.type   ] [elastic ASIC 
nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], 
s[STARTED]: Failed to execute 
[org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true]
org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC 
nodo 2][inet[/158.42.250.79:9301]][search/phase/query]
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
at 
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208)
at 
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143)
at 
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59)
at 
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49)
at 
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at 
org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108)
at 
org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
at 
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92)
at 
org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212)
at 
org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98)
at 
org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
at 
org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
at 
org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
at 
org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
at 

Re: Nodes restarting automatically

2014-05-22 Thread Jorge Ferrando
elasticsearch nodes are launched through /etc/init.d/elasticsearch


On Thu, May 22, 2014 at 2:13 PM, Mark Walkom ma...@campaignmonitor.comwrote:

 How are you running the service, upstart, init or something else?

 ES shouldn't just restart on it's own, this could be something else like
 the kernel's OOM killer.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 22 May 2014 22:07, Jorge Ferrando jorfe...@gmail.com wrote:

 Hello

 We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and
 elasticsearch v1.1.1

 It's be running flawlessly but since the last weak some of the nodes
 restarts randomly and cluster gets to red state, then yellow, then green
 and it happens again in a loop (sometimes it even doesnt get green state)

 I've tried to look at the logs but i can't find and obvious reason of
 what can be going on

 I've found entries like these, but I don't know if they are in some way
 related to the crash:

 [2014-05-22 13:55:16,150][WARN ][index.codec  ] [elastic ASIC
 nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end]
 returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic ASIC
 nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_end.raw] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic ASIC
 nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start]
 returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic ASIC
 nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_start.raw] returning default postings format


 For instance right now it was in yellow state, really close to get to the
 green state and suddenly node 3 autorestarted and now cluster is red with
 2000 shard initializing. The log in that node shows this:

 [2014-05-22 13:59:48,498][INFO ][monitor.jvm  ] [elastic ASIC
 nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s],
 total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young]
 [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old]
 [6gb]-[6gb]/[19.3gb]}
 [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC
 nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z]
 [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC
 nodo 3] initializing ...
 [2014-05-22 14:03:44,839][INFO ][plugins  ] [elastic ASIC
 nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head]
 [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC
 nodo 3] initialized
 [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC
 nodo 3] starting ...

 The crash happened exactly at 14:02.

 Any Idea what can be going on or how can I trace what's happening?

 After rebooting there are also DEBUG errors like this:

 [2014-05-22 14:06:16,621][DEBUG][action.search.type   ] [elastic ASIC
 nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P],
 s[STARTED]: Failed to execute
 [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true]
 org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC
 nodo 2][inet[/158.42.250.79:9301]][search/phase/query]
 at
 org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
  at
 org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
 at
 org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208)
  at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
 at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
  at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
 at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143)
  at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59)
 at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49)
  at
 org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
 at
 org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108)
  at
 org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
 at
 org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
  at org.elasticsearch.client.node.NodeClient.execute

Re: Nodes restarting automatically

2014-05-22 Thread Jorge Ferrando
I've been checking syslog in all of the nodes and I found no mention to
oom, process killed, out of memory or something similar...

Just in caes I ran this commands in the 3 nodes and the problem persists:

echo 0  /proc/sys/vm/oom-kill
echo 1  /proc/sys/vm/overcommit_memory
echo 100  /proc/sys/vm/overcommit_ratio


On Thu, May 22, 2014 at 2:16 PM, Nikolas Everett nik9...@gmail.com wrote:

 Like Mark said, check the oomkiller.  It should log to syslog.  Its is
 evil.

 Nik


 On Thu, May 22, 2014 at 2:14 PM, Jorge Ferrando jorfe...@gmail.comwrote:

 elasticsearch nodes are launched through /etc/init.d/elasticsearch


 On Thu, May 22, 2014 at 2:13 PM, Mark Walkom 
 ma...@campaignmonitor.comwrote:

 How are you running the service, upstart, init or something else?

 ES shouldn't just restart on it's own, this could be something else like
 the kernel's OOM killer.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 22 May 2014 22:07, Jorge Ferrando jorfe...@gmail.com wrote:

 Hello

 We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and
 elasticsearch v1.1.1

 It's be running flawlessly but since the last weak some of the nodes
 restarts randomly and cluster gets to red state, then yellow, then green
 and it happens again in a loop (sometimes it even doesnt get green state)

 I've tried to look at the logs but i can't find and obvious reason of
 what can be going on

 I've found entries like these, but I don't know if they are in some way
 related to the crash:

 [2014-05-22 13:55:16,150][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_end] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_end.raw] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_start] returning default postings format
 [2014-05-22 13:55:16,151][WARN ][index.codec  ] [elastic
 ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
 [date_start.raw] returning default postings format


 For instance right now it was in yellow state, really close to get to
 the green state and suddenly node 3 autorestarted and now cluster is red
 with 2000 shard initializing. The log in that node shows this:

 [2014-05-22 13:59:48,498][INFO ][monitor.jvm  ] [elastic
 ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s],
 total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young]
 [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old]
 [6gb]-[6gb]/[19.3gb]}
 [2014-05-22 14:03:44,825][INFO ][node ] [elastic
 ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z]
 [2014-05-22 14:03:44,826][INFO ][node ] [elastic
 ASIC nodo 3] initializing ...
 [2014-05-22 14:03:44,839][INFO ][plugins  ] [elastic
 ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head]
 [2014-05-22 14:03:51,967][INFO ][node ] [elastic
 ASIC nodo 3] initialized
 [2014-05-22 14:03:51,967][INFO ][node ] [elastic
 ASIC nodo 3] starting ...

 The crash happened exactly at 14:02.

 Any Idea what can be going on or how can I trace what's happening?

 After rebooting there are also DEBUG errors like this:

 [2014-05-22 14:06:16,621][DEBUG][action.search.type   ] [elastic
 ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P],
 s[STARTED]: Failed to execute
 [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard
 [true]
 org.elasticsearch.transport.SendRequestTransportException: [elastic
 ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query]
 at
 org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
  at
 org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
 at
 org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208)
  at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
 at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
  at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
 at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143)
  at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59