OutOfMemoryError on marvel node brought down the production cluster

T Vinod Gupta Thu, 17 Apr 2014 13:08:20 -0700

hi,
in my setup, marvel node is different from production cluster.. the
production nodes send data to marvel node.. marvel node had OOM exception.
this brings me to the quesiton, how much heap does it need? i ran with
default config.


in my prod cluster, i have a load balancer which is no data node. it runs
with just 2GB heap. due to marvel failure, this node was getting timeouts
and for some strange reason went down.

what are the best practices here? how can i avoid this in the future?

marvel node -
[2014-04-17 09:13:33,715][WARN ][index.engine.internal    ] [Gorilla-Man]
[.marvel-2014.04.17][0] failed engine
java.lang.OutOfMemoryError: Java heap space
[2014-04-17 09:13:46,890][ERROR][index.engine.internal    ] [Gorilla-Man]
[.marvel-2014.04.17][0] failed to acquire searcher, source search_factory
org.apache.lucene.store.AlreadyClosedException: this ReferenceManager is
closed
        at
org.apache.lucene.search.ReferenceManager.acquire(ReferenceManager.java:98)
...


ES LB node -
[2014-04-17 00:01:00,567][ERROR][marvel.agent.exporter    ] [Darkoth]
create fai
lure (index:[.marvel-2014.04.16] type: [node_stats]):
UnavailableShardsException
[[.marvel-2014.04.16][0] [2] shardIt, [0] active : Timeout waiting for
[1m], req
uest: org.elasticsearch.action.bulk.BulkShardRequest@5d9be928]
[2014-04-17 06:41:46,975][ERROR][marvel.agent.exporter    ] [Darkoth] error
conn
ecting to [ip-10-68-145-124.ec2.internal:9200]
java.net.SocketTimeoutException: connect timed out
[2014-04-17 18:53:09,969][DEBUG][action.admin.cluster.node.info] [Darkoth]
faile
d to execute on node [L1f57myxQLK1SSRHRFcvFQ]
java.lang.OutOfMemoryError: Java heap space
[2014-04-17 19:35:05,805][DEBUG][action.search.type       ] [Witchfire]
[twitter
_072013][0], node[5GNeFfbPTGi-1EccVvR7Nw], [P], s[STARTED]: Failed to
execute [o
rg.elasticsearch.action.search.SearchRequest@2f94d571] lastShard [true]
org.elasticsearch.transport.RemoteTransportException: [Mauvais][inet[/
10.183.42.
216:9300]][search/phase/query]
Caused by:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException
: rejected execution (queue capacity 1000) on
org.elasticsearch.transport.netty.
MessageChannelHandler$RequestHandler@4c75d754
        at
org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecut
ion(EsAbortPolicy.java:62)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHau4yvYsVO%2BbSk_U0cU7%3Di7G4FFgqwHQo_1as%3DezM9t20TRuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

OutOfMemoryError on marvel node brought down the production cluster

Reply via email to