Ooops.  Sorry.  That was a copy/paste error.  It is using 16GB.  Here is
the correct process arguments:

/usr/bin/java -Xms16g -Xmx16g -Xss256k -Djava.awt.headless=true -server
-XX:+UseCompressedOops -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
-Des.path.home=/usr/share/elasticsearch [snip CP]


On Thu, Jul 31, 2014 at 2:43 AM, David Pilato <> wrote:

> Why do you start with 8gb HEAP? Can't you give 16gb or so?
> /usr/bin/java -Xms8g -Xmx8g
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> Le 30 juil. 2014 à 19:47, Chris Neal <> a écrit :
> Hi everyone,
> First off, apologies for the thread.  I know OOME discussions are somewhat
> overdone in the group, but I need to reach out for some help for this one.
> I have a 2 node development cluster in EC2 on c3.4xlarge AMIs.  That means
> 16 vCPUs, 30GB RAM, 1Gb network, and I have 2 500GB EBS volumes for
> Elasticsearch data on each AMI.
> I'm running Java 1.7.0_55, and using the G1 collector.  The Java args are:
> /usr/bin/java -Xms8g -Xmx8g -Xss256k -Djava.awt.headless=true -server
> -XX:+UseCompressedOops -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError
> The index has 2 shards, each with 1 replica.
> I have a daily index being filled with application log data.  The index,
> on average, gets to be about:
> 486M documents
> 53.1GB (primary size)
> 106.2GB (total size)
> Other than indexing, there really is nothing going on in the cluster.  No
> searches, or percolators, just collecting data.
> I have:
>    - Tweaked the idex.merge.policy
>    - Tweaked the indices.fielddata.breaker.limit and cache.size
>    - change the index refresh_interval from 1s to 60s
>    - created a default template for the index such that _all is disabled,
>    and all fields in the mapping are set to "not_analyzed".
> Here is my complete elasticsearch.yml:
> action:
>   disable_delete_all_indices: true
> cluster:
>   name: elasticsearch-dev
> discovery:
>   zen:
>     minimum_master_nodes: 2
>     ping:
>       multicast:
>         enabled: false
>       unicast:
>         hosts:,
> gateway:
>   recover_after_nodes: 2
> index:
>   merge:
>     policy:
>       max_merge_at_once: 5
>       max_merged_segment: 15gb
>   number_of_replicas: 1
>   number_of_shards: 2
>   refresh_interval: 60s
> indices:
>   fielddata:
>     breaker:
>       limit: 50%
>     cache:
>       size: 30%
> node:
>   name: elasticsearch-ip-10-0-0-45
> path:
>   data:
>       - /usr/local/ebs01/elasticsearch
>       - /usr/local/ebs02/elasticsearch
> threadpool:
>   bulk:
>     queue_size: 500
>     size: 75
>     type: fixed
>   get:
>     queue_size: 200
>     size: 100
>     type: fixed
>   index:
>     queue_size: 1000
>     size: 100
>     type: fixed
>   search:
>     queue_size: 200
>     size: 100
>     type: fixed
> The heap sits about 13GB used.   I had been batting OOME exceptions for
> awhile, and thought I had it licked, but one just popped up again.  My
> cluster has been up and running fine for 14 days, and I just got this OOME:
> =====
> [2014-07-30 11:52:28,394][INFO ][monitor.jvm              ]
> [elasticsearch-ip-10-0-0-41] [gc][young][1158834][109906] duration [770ms],
> collections [1]/[1s], total [770ms]/[43.2m], memory
> [13.4gb]->[13.4gb]/[16gb], all_pools {[young]
> [648mb]->[8mb]/[0b]}{[survivor] [0b]->[0b]/[0b]}{[old]
> [12.8gb]->[13.4gb]/[16gb]}
> [2014-07-30 15:03:01,070][WARN ][index.engine.internal    ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed engine [out of
> memory]
> [2014-07-30 15:03:10,324][WARN
> ][] Unexpected exception in the
> selector loop.
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:03:10,335][WARN
> ][] Unexpected exception in the
> selector loop.
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:03:10,324][WARN ][index.merge.scheduler    ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed to merge
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:03:28,595][WARN ][index.translog           ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed to flush shard
> on translog threshold
> org.elasticsearch.index.engine.FlushFailedEngineException:
> [derbysoft-20140730][0] Flush failed
> at
> org.elasticsearch.index.engine.internal.InternalEngine.flush(
>  at
> org.elasticsearch.index.shard.service.InternalIndexShard.flush(
> at
> org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> at
> java.util.concurrent.ThreadPoolExecutor$
>  at
> Caused by: java.lang.IllegalStateException: this writer hit an
> OutOfMemoryError; cannot commit
> at org.apache.lucene.index.IndexWriter.startCommit(
>  at
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(
> at
> org.apache.lucene.index.IndexWriter.commitInternal(
>  at org.apache.lucene.index.IndexWriter.commit(
> at
> org.elasticsearch.index.engine.internal.InternalEngine.flush(
>  ... 5 more
> [2014-07-30 15:03:28,658][WARN ][cluster.action.shard     ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] sending failed shard
> for [derbysoft-20140730][0], node[W-7FsjjZTyOXZdaJhhqxEA], [R], s[STARTED],
> indexUUID [QC5Sg0FDSnOGUiFg30qNxA], reason [engine failure, message [out of
> memory][IllegalStateException[this writer hit an OutOfMemoryError; cannot
> commit]]]
> [2014-07-30 15:34:36,418][WARN
> ][] Unexpected exception in the
> selector loop.
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:34:39,847][WARN
> ][] Unexpected exception in the
> selector loop.
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:34:42,873][WARN ][index.merge.scheduler    ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] failed to merge
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:34:42,873][WARN ][index.engine.internal    ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] failed engine [merge
> exception]
> [2014-07-30 15:34:43,185][WARN ][cluster.action.shard     ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] sending failed shard
> for [derbysoft-20140730][1], node[W-7FsjjZTyOXZdaJhhqxEA], [P], s[STARTED],
> indexUUID [QC5Sg0FDSnOGUiFg30qNxA], reason [engine failure, message [merge
> exception][MergeException[java.lang.OutOfMemoryError: Java heap space];
> nested: OutOfMemoryError[Java heap space]; ]]
> [2014-07-30 15:57:42,531][WARN ][indices.recovery         ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] recovery from
> [[elasticsearch-ip-10-0-0-45][AjN-6_DHQK6B8NJgfphMvA][
> -west-2.compute.internal][inet[/]]] failed
> org.elasticsearch.transport.RemoteTransportException:
> [elasticsearch-ip-10-0-0-45][inet[/
> ]][index/shard/recovery/startRecovery]
> Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
> [derbysoft-20140730][1] Phase[2] Execution failed
>  at
> org.elasticsearch.index.engine.internal.InternalEngine.recover(
> at
> org.elasticsearch.index.shard.service.InternalIndexShard.recover(
>  at
> org.elasticsearch.indices.recovery.RecoverySource.recover(
> at
> org.elasticsearch.indices.recovery.RecoverySource.access$1600(
>  at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(
> at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(
>  at
> org.elasticsearch.transport.netty.MessageChannelHandler$
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
>  at
> java.util.concurrent.ThreadPoolExecutor$
> at
> Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException:
> [elasticsearch-ip-10-0-0-41][inet[/]][index/shard/recovery/prepareTranslog]
> request_id [13988539] timed out after [900000ms]
>  at
> org.elasticsearch.transport.TransportService$
> ... 3 more
> [2014-07-30 15:57:42,534][WARN ][indices.cluster          ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] failed to start shard
> org.elasticsearch.indices.recovery.RecoveryFailedException:
> [derbysoft-20140730][1]: Recovery failed from
> [elasticsearch-ip-10-0-0-45][AjN-6_DHQK6B8NJgfphMvA][
> -west-2.compute.internal][inet[/]] into
> [elasticsearch-ip-10-0-0-41][W-7FsjjZTyOXZdaJhhqxEA][
> -west-2.compute.internal][inet[
>  at
> org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(
> at
> org.elasticsearch.indices.recovery.RecoveryTarget.access$300(
>  at
> org.elasticsearch.indices.recovery.RecoveryTarget$
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
>  at
> java.util.concurrent.ThreadPoolExecutor$
> at
> Caused by: org.elasticsearch.transport.RemoteTransportException:
> [elasticsearch-ip-10-0-0-45][inet[/
> ]][index/shard/recovery/startRecovery]
> Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
> [derbysoft-20140730][1] Phase[2] Execution failed
>  at
> org.elasticsearch.index.engine.internal.InternalEngine.recover(
> at
> org.elasticsearch.index.shard.service.InternalIndexShard.recover(
>  at
> org.elasticsearch.indices.recovery.RecoverySource.recover(
> at
> org.elasticsearch.indices.recovery.RecoverySource.access$1600(
>  at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(
> at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(
>  at
> org.elasticsearch.transport.netty.MessageChannelHandler$
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
>  at
> java.util.concurrent.ThreadPoolExecutor$
> at
> Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException:
> [elasticsearch-ip-10-0-0-41][inet[/]][index/shard/recovery/prepareTranslog]
> request_id [13988539] timed out after [900000ms]
>  at
> org.elasticsearch.transport.TransportService$
> ... 3 more
> [2014-07-30 15:57:42,535][WARN ][cluster.action.shard     ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] sending failed shard
> for [derbysoft-20140730][1], node[W-7FsjjZTyOXZdaJhhqxEA], [R],
> s[INITIALIZING], indexUUID [QC5Sg0FDSnOGUiFg30qNxA], reason [Failed to
> start shard, message [RecoveryFailedException[[derbysoft-20140730][1]:
> Recovery failed from [elasticsearch-ip-10-0-0-45][AjN-6_DHQK6B8NJgfphMvA][
>][inet[/]] into
> [elasticsearch-ip-10-0-0-41][W-7FsjjZTyOXZdaJhhqxEA][
> -west-2.compute.internal][inet[
> -west-2.compute.internal/]]]; nested:
> RemoteTransportException[[elasticsearch-ip-10-0-0-45][inet[/]][index/shard/recovery/startRecovery]];
> nested: RecoveryEngineException[[derbysoft-20140730][1] Phase[2] Execution
> failed]; nested:
> ReceiveTimeoutTransportException[[elasticsearch-ip-10-0-0-41][inet[/]][index/shard/recovery/prepareTranslog]
> request_id [13988539] timed out after [900000ms]]; ]]
> =====
> I'm a bit at a loss as to what to try next to address this problem.  Can
> anyone offer a suggestion?
> Thanks for reading this.
> Chris
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> To view this discussion on the web visit
> <>
> .
> For more options, visit
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> To view this discussion on the web visit
> <>
> .
> For more options, visit

You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit

Reply via email to