It'd be worth raising this as an issue on Github if you are concerned, at least then the ES devs will see it :)
Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 18:34, Martin Forssen <m...@recordedfuture.com> wrote: > I see the same problem. We are running 1.1.1 on a 13-node cluster (3 > master and 5+5 data). I see stuck threads on most of the data nodes, I had > a look around on one of them. Top in thread mode shows: > top - 08:08:20 up 62 days, 18:49, 1 user, load average: 9.18, 13.21, > 12.67 > Threads: 528 total, 14 running, 514 sleeping, 0 stopped, 0 zombie > %Cpu(s): 39.0 us, 1.5 sy, 0.0 ni, 59.0 id, 0.2 wa, 0.2 hi, 0.0 si, > 0.1 st > KiB Mem: 62227892 total, 61933428 used, 294464 free, 65808 buffers > KiB Swap: 61865980 total, 19384 used, 61846596 free. 24645668 cached Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > > 3743 elastic+ 20 0 1.151t 0.045t 0.013t S 93.4 78.1 17462:00 > java > > 3748 elastic+ 20 0 1.151t 0.045t 0.013t S 93.4 78.1 17457:55 > java > > 3761 elastic+ 20 0 1.151t 0.045t 0.013t S 93.1 78.1 17455:21 > java > > 3744 elastic+ 20 0 1.151t 0.045t 0.013t S 92.7 78.1 17456:55 > java > > 1758 elastic+ 20 0 1.151t 0.045t 0.013t R 5.9 78.1 3450:01 > java > > 1755 elastic+ 20 0 1.151t 0.045t 0.013t R 5.6 78.1 3450:05 > java > > So I have four threads consuming way more CPU than the others. The node is > only doing a moderate amount of garbage collection. Running jstack I find > that all the stuck threads have stack dump which looks like this: > Thread 3744: (state = IN_JAVA) > - java.util.HashMap.getEntry(java.lang.Object) @bci=72, line=446 > (Compiled frame; information may be imprecise) > - java.util.HashMap.get(java.lang.Object) @bci=11, line=405 (Compiled > frame) > - > org.elasticsearch.search.scan.ScanContext$ScanFilter.getDocIdSet(org.apache.lucene.index.AtomicReaderContext, > org.apache.lucene.util.Bits) @bci=8, line=156 (Compiled frame) > - > org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(org.apache.lucene.index.AtomicReaderContext, > org.apache.lucene.util.Bits) @bci=6, line=45 (Compiled frame) > - > org.apache.lucene.search.FilteredQuery$1.scorer(org.apache.lucene.index.AtomicReaderContext, > boolean, boolean, org.apache.lucene.util.Bits) @bci=34, line=130 (Compiled > frame) > - org.apache.lucene.search.IndexSearcher.search(java.util.List, > org.apache.lucene.search.Weight, org.apache.lucene.search.Collector) > @bci=68, line=618 (Compiled frame) > - > org.elasticsearch.search.internal.ContextIndexSearcher.search(java.util.List, > org.apache.lucene.search.Weight, org.apache.lucene.search.Collector) > @bci=225, line=173 (Compiled frame) > - > org.apache.lucene.search.IndexSearcher.search(org.apache.lucene.search.Query, > org.apache.lucene.search.Collector) @bci=11, line=309 (Interpreted frame) > - > org.elasticsearch.search.scan.ScanContext.execute(org.elasticsearch.search.internal.SearchContext) > @bci=54, line=52 (Interpreted frame) > - > org.elasticsearch.search.query.QueryPhase.execute(org.elasticsearch.search.internal.SearchContext) > @bci=174, line=119 (Compiled frame) > - > org.elasticsearch.search.SearchService.executeScan(org.elasticsearch.search.internal.InternalScrollSearchRequest) > @bci=49, line=233 (Interpreted frame) > - > org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(org.elasticsearch.search.internal.InternalScrollSearchRequest, > org.elasticsearch.transport.TransportChannel) @bci=8, line=791 (Interpreted > frame) > - > org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(org.elasticsearch.transport.TransportRequest, > org.elasticsearch.transport.TransportChannel) @bci=6, line=780 (Interpreted > frame) > - > org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run() > @bci=12, line=270 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1145 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 > (Interpreted frame) > - java.lang.Thread.run() @bci=11, line=724 (Interpreted frame) > > The state varies between IN_JAVA an BLOCKED. I took two stack traces 10 > minutes apart and they were identical for the suspect threads. > > I assume this could be a very long running query, but I wonder if it isn't > just stuck. Perhaps we are seeing this issue: > http://stackoverflow.com/questions/17070184/hashmap-stuck-on-get > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/78b9300c-02c3-4aba-be0e-98d92b97ee7d%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/78b9300c-02c3-4aba-be0e-98d92b97ee7d%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YxRt0Sh8f8O0dvfKoWPz4RsDTSk-WnkR7XAdgo-%2BzGSg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.