I have a 10 machine cluster where frequently (about once per day when indexing and querying is at its height) one elasticsearch node goes OOM... It usually recovers, but by this time the cluster is redistributing the lost shards, which causes more load, which often in turn causes an OOM on another machine. Each machine has 32GB memory of which I currently have 12GB allocated to Elasticsearch. I have logstash (max 500M) and redis (max 2GB) running on the machines too, and see that the remaining ~17GB is used for file cache... i.e. it all looks healthy, up until the moment when elasticsearch spews e.g. this sequence of errors:
Actual Exception org.elasticsearch.search.query.QueryPhaseExecutionException: [logstash-2014.06.24][1]: query[ConstantScore(*:*)],from[0],size[0]: Query Failed [Failed to execute main query] at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:127) at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:257) at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623) at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.OutOfMemoryError: Java heap space Failed to send error message back to client for action [search/phase/query] java.lang.OutOfMemoryError: Java heap space Actual Exception org.elasticsearch.index.IndexShardMissingException: [logstash-2014.06.25][3] missing at org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:182) at org.elasticsearch.search.SearchService.createContext(SearchService.java:496) at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480) at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252) at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623) at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Any ideas what might be going wrong here, or what I might be able to do to remedy the situation? Cheers, -Robin- -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/096b5a00-745e-4140-a804-5e7b5afcdf9d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.