Re: Yet another OOME: Java heap space thread :S

Chris Neal Thu, 31 Jul 2014 08:06:55 -0700

Ooops.  Sorry.  That was a copy/paste error.  It is using 16GB.  Here is
the correct process arguments:


/usr/bin/java -Xms16g -Xmx16g -Xss256k -Djava.awt.headless=true -server
-XX:+UseCompressedOops -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
-Des.pidfile=/var/run/elasticsearch/elasticsearch.pid
-Des.path.home=/usr/share/elasticsearch [snip CP]

Thanks!
Chris


On Thu, Jul 31, 2014 at 2:43 AM, David Pilato <da...@pilato.fr> wrote:

> Why do you start with 8gb HEAP? Can't you give 16gb or so?
>
> /usr/bin/java -Xms8g -Xmx8g
>
>
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 30 juil. 2014 à 19:47, Chris Neal <chris.n...@derbysoft.net> a écrit :
>
> Hi everyone,
>
> First off, apologies for the thread.  I know OOME discussions are somewhat
> overdone in the group, but I need to reach out for some help for this one.
>
> I have a 2 node development cluster in EC2 on c3.4xlarge AMIs.  That means
> 16 vCPUs, 30GB RAM, 1Gb network, and I have 2 500GB EBS volumes for
> Elasticsearch data on each AMI.
>
> I'm running Java 1.7.0_55, and using the G1 collector.  The Java args are:
>
> /usr/bin/java -Xms8g -Xmx8g -Xss256k -Djava.awt.headless=true -server
> -XX:+UseCompressedOops -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError
> The index has 2 shards, each with 1 replica.
>
> I have a daily index being filled with application log data.  The index,
> on average, gets to be about:
> 486M documents
> 53.1GB (primary size)
> 106.2GB (total size)
>
> Other than indexing, there really is nothing going on in the cluster.  No
> searches, or percolators, just collecting data.
>
> I have:
>
>    - Tweaked the idex.merge.policy
>    - Tweaked the indices.fielddata.breaker.limit and cache.size
>    - change the index refresh_interval from 1s to 60s
>    - created a default template for the index such that _all is disabled,
>    and all fields in the mapping are set to "not_analyzed".
>
> Here is my complete elasticsearch.yml:
>
> action:
>   disable_delete_all_indices: true
> cluster:
>   name: elasticsearch-dev
> discovery:
>   zen:
>     minimum_master_nodes: 2
>     ping:
>       multicast:
>         enabled: false
>       unicast:
>         hosts: 10.0.0.45,10.0.0.41
> gateway:
>   recover_after_nodes: 2
> index:
>   merge:
>     policy:
>       max_merge_at_once: 5
>       max_merged_segment: 15gb
>   number_of_replicas: 1
>   number_of_shards: 2
>   refresh_interval: 60s
> indices:
>   fielddata:
>     breaker:
>       limit: 50%
>     cache:
>       size: 30%
> node:
>   name: elasticsearch-ip-10-0-0-45
> path:
>   data:
>       - /usr/local/ebs01/elasticsearch
>       - /usr/local/ebs02/elasticsearch
> threadpool:
>   bulk:
>     queue_size: 500
>     size: 75
>     type: fixed
>   get:
>     queue_size: 200
>     size: 100
>     type: fixed
>   index:
>     queue_size: 1000
>     size: 100
>     type: fixed
>   search:
>     queue_size: 200
>     size: 100
>     type: fixed
>
> The heap sits about 13GB used.   I had been batting OOME exceptions for
> awhile, and thought I had it licked, but one just popped up again.  My
> cluster has been up and running fine for 14 days, and I just got this OOME:
>
> =====
> [2014-07-30 11:52:28,394][INFO ][monitor.jvm              ]
> [elasticsearch-ip-10-0-0-41] [gc][young][1158834][109906] duration [770ms],
> collections [1]/[1s], total [770ms]/[43.2m], memory
> [13.4gb]->[13.4gb]/[16gb], all_pools {[young]
> [648mb]->[8mb]/[0b]}{[survivor] [0b]->[0b]/[0b]}{[old]
> [12.8gb]->[13.4gb]/[16gb]}
> [2014-07-30 15:03:01,070][WARN ][index.engine.internal    ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed engine [out of
> memory]
> [2014-07-30 15:03:10,324][WARN
> ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the
> selector loop.
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:03:10,335][WARN
> ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the
> selector loop.
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:03:10,324][WARN ][index.merge.scheduler    ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed to merge
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:03:28,595][WARN ][index.translog           ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed to flush shard
> on translog threshold
> org.elasticsearch.index.engine.FlushFailedEngineException:
> [derbysoft-20140730][0] Flush failed
> at
> org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:805)
>  at
> org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:604)
> at
> org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$1.run(TranslogService.java:202)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.IllegalStateException: this writer hit an
> OutOfMemoryError; cannot commit
> at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4416)
>  at
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2989)
> at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3096)
>  at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3063)
> at
> org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:797)
>  ... 5 more
> [2014-07-30 15:03:28,658][WARN ][cluster.action.shard     ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] sending failed shard
> for [derbysoft-20140730][0], node[W-7FsjjZTyOXZdaJhhqxEA], [R], s[STARTED],
> indexUUID [QC5Sg0FDSnOGUiFg30qNxA], reason [engine failure, message [out of
> memory][IllegalStateException[this writer hit an OutOfMemoryError; cannot
> commit]]]
> [2014-07-30 15:34:36,418][WARN
> ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the
> selector loop.
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:34:39,847][WARN
> ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the
> selector loop.
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:34:42,873][WARN ][index.merge.scheduler    ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] failed to merge
> java.lang.OutOfMemoryError: Java heap space
> [2014-07-30 15:34:42,873][WARN ][index.engine.internal    ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] failed engine [merge
> exception]
> [2014-07-30 15:34:43,185][WARN ][cluster.action.shard     ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] sending failed shard
> for [derbysoft-20140730][1], node[W-7FsjjZTyOXZdaJhhqxEA], [P], s[STARTED],
> indexUUID [QC5Sg0FDSnOGUiFg30qNxA], reason [engine failure, message [merge
> exception][MergeException[java.lang.OutOfMemoryError: Java heap space];
> nested: OutOfMemoryError[Java heap space]; ]]
> [2014-07-30 15:57:42,531][WARN ][indices.recovery         ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] recovery from
> [[elasticsearch-ip-10-0-0-45][AjN-6_DHQK6B8NJgfphMvA][ip-10-0-0-45.us
> -west-2.compute.internal][inet[/10.0.0.45:9300]]] failed
> org.elasticsearch.transport.RemoteTransportException:
> [elasticsearch-ip-10-0-0-45][inet[/10.0.0.45:9300
> ]][index/shard/recovery/startRecovery]
> Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
> [derbysoft-20140730][1] Phase[2] Execution failed
>  at
> org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1011)
> at
> org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:631)
>  at
> org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:122)
> at
> org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:62)
>  at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:351)
> at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337)
>  at
> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException:
> [elasticsearch-ip-10-0-0-41][inet[/10.0.0.41:9300]][index/shard/recovery/prepareTranslog]
> request_id [13988539] timed out after [900000ms]
>  at
> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:369)
> ... 3 more
> [2014-07-30 15:57:42,534][WARN ][indices.cluster          ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] failed to start shard
> org.elasticsearch.indices.recovery.RecoveryFailedException:
> [derbysoft-20140730][1]: Recovery failed from
> [elasticsearch-ip-10-0-0-45][AjN-6_DHQK6B8NJgfphMvA][ip-10-0-0-45.us
> -west-2.compute.internal][inet[/10.0.0.45:9300]] into
> [elasticsearch-ip-10-0-0-41][W-7FsjjZTyOXZdaJhhqxEA][ip-10-0-0-41.us
> -west-2.compute.internal][inet[ip-10-0-0-41.us-west-2.compute.internal/
> 10.0.0.41:9300]]
>  at
> org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:306)
> at
> org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:65)
>  at
> org.elasticsearch.indices.recovery.RecoveryTarget$3.run(RecoveryTarget.java:184)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: org.elasticsearch.transport.RemoteTransportException:
> [elasticsearch-ip-10-0-0-45][inet[/10.0.0.45:9300
> ]][index/shard/recovery/startRecovery]
> Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
> [derbysoft-20140730][1] Phase[2] Execution failed
>  at
> org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1011)
> at
> org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:631)
>  at
> org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:122)
> at
> org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:62)
>  at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:351)
> at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337)
>  at
> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException:
> [elasticsearch-ip-10-0-0-41][inet[/10.0.0.41:9300]][index/shard/recovery/prepareTranslog]
> request_id [13988539] timed out after [900000ms]
>  at
> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:369)
> ... 3 more
> [2014-07-30 15:57:42,535][WARN ][cluster.action.shard     ]
> [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][1] sending failed shard
> for [derbysoft-20140730][1], node[W-7FsjjZTyOXZdaJhhqxEA], [R],
> s[INITIALIZING], indexUUID [QC5Sg0FDSnOGUiFg30qNxA], reason [Failed to
> start shard, message [RecoveryFailedException[[derbysoft-20140730][1]:
> Recovery failed from [elasticsearch-ip-10-0-0-45][AjN-6_DHQK6B8NJgfphMvA][
> ip-10-0-0-45.us-west-2.compute.internal][inet[/10.0.0.45:9300]] into
> [elasticsearch-ip-10-0-0-41][W-7FsjjZTyOXZdaJhhqxEA][ip-10-0-0-41.us
> -west-2.compute.internal][inet[ip-10-0-0-41.us
> -west-2.compute.internal/10.0.0.41:9300]]]; nested:
> RemoteTransportException[[elasticsearch-ip-10-0-0-45][inet[/10.0.0.45:9300]][index/shard/recovery/startRecovery]];
> nested: RecoveryEngineException[[derbysoft-20140730][1] Phase[2] Execution
> failed]; nested:
> ReceiveTimeoutTransportException[[elasticsearch-ip-10-0-0-41][inet[/10.0.0.41:9300]][index/shard/recovery/prepareTranslog]
> request_id [13988539] timed out after [900000ms]]; ]]
>
> =====
>
> I'm a bit at a loss as to what to try next to address this problem.  Can
> anyone offer a suggestion?
> Thanks for reading this.
>
> Chris
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAND3DpjXh8Bcrfuy6GLvFOoDH7yUMLr1%3DA2Sb0SFQx6PgkQEvA%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAND3DpjXh8Bcrfuy6GLvFOoDH7yUMLr1%3DA2Sb0SFQx6PgkQEvA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/A258FFB5-72F5-40C2-A75C-BFF5FA3B8314%40pilato.fr
> <https://groups.google.com/d/msgid/elasticsearch/A258FFB5-72F5-40C2-A75C-BFF5FA3B8314%40pilato.fr?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpi-hcOfbPMFvVoNZTF5pDXcGLHqPazGo2Hj3aGvBSh_pQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Yet another OOME: Java heap space thread :S

Reply via email to