It'd be worth raising this as an issue on Github if you are concerned, at
least then the ES devs will see it :)

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 August 2014 18:34, Martin Forssen <m...@recordedfuture.com> wrote:

> I see the same problem. We are running 1.1.1 on a 13-node cluster (3
> master and 5+5 data). I see stuck threads on most of the data nodes, I had
> a look around on one of them. Top in thread mode shows:
> top - 08:08:20 up 62 days, 18:49,  1 user,  load average: 9.18, 13.21,
> 12.67
> Threads: 528 total,  14 running, 514 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 39.0 us,  1.5 sy,  0.0 ni, 59.0 id,  0.2 wa,  0.2 hi,  0.0 si,
> 0.1 st
> KiB Mem:  62227892 total, 61933428 used,   294464 free,    65808 buffers
> KiB Swap: 61865980 total,    19384 used, 61846596 free. 24645668 cached Mem
>
>   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+
> COMMAND
>
>  3743 elastic+  20   0  1.151t 0.045t 0.013t S 93.4 78.1  17462:00
> java
>
>  3748 elastic+  20   0  1.151t 0.045t 0.013t S 93.4 78.1  17457:55
> java
>
>  3761 elastic+  20   0  1.151t 0.045t 0.013t S 93.1 78.1  17455:21
> java
>
>  3744 elastic+  20   0  1.151t 0.045t 0.013t S 92.7 78.1  17456:55
> java
>
>  1758 elastic+  20   0  1.151t 0.045t 0.013t R  5.9 78.1   3450:01
> java
>
>  1755 elastic+  20   0  1.151t 0.045t 0.013t R  5.6 78.1   3450:05
> java
>
> So I have four threads consuming way more CPU than the others. The node is
> only doing a moderate amount of garbage collection. Running jstack I find
> that all the stuck threads have  stack dump which looks like this:
> Thread 3744: (state = IN_JAVA)
>  - java.util.HashMap.getEntry(java.lang.Object) @bci=72, line=446
> (Compiled frame; information may be imprecise)
>  - java.util.HashMap.get(java.lang.Object) @bci=11, line=405 (Compiled
> frame)
>  -
> org.elasticsearch.search.scan.ScanContext$ScanFilter.getDocIdSet(org.apache.lucene.index.AtomicReaderContext,
> org.apache.lucene.util.Bits) @bci=8, line=156 (Compiled frame)
>  -
> org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(org.apache.lucene.index.AtomicReaderContext,
> org.apache.lucene.util.Bits) @bci=6, line=45 (Compiled frame)
>  -
> org.apache.lucene.search.FilteredQuery$1.scorer(org.apache.lucene.index.AtomicReaderContext,
> boolean, boolean, org.apache.lucene.util.Bits) @bci=34, line=130 (Compiled
> frame)
>  - org.apache.lucene.search.IndexSearcher.search(java.util.List,
> org.apache.lucene.search.Weight, org.apache.lucene.search.Collector)
> @bci=68, line=618 (Compiled frame)
>  -
> org.elasticsearch.search.internal.ContextIndexSearcher.search(java.util.List,
> org.apache.lucene.search.Weight, org.apache.lucene.search.Collector)
> @bci=225, line=173 (Compiled frame)
>  -
> org.apache.lucene.search.IndexSearcher.search(org.apache.lucene.search.Query,
> org.apache.lucene.search.Collector) @bci=11, line=309 (Interpreted frame)
>  -
> org.elasticsearch.search.scan.ScanContext.execute(org.elasticsearch.search.internal.SearchContext)
> @bci=54, line=52 (Interpreted frame)
>  -
> org.elasticsearch.search.query.QueryPhase.execute(org.elasticsearch.search.internal.SearchContext)
> @bci=174, line=119 (Compiled frame)
>  -
> org.elasticsearch.search.SearchService.executeScan(org.elasticsearch.search.internal.InternalScrollSearchRequest)
> @bci=49, line=233 (Interpreted frame)
>  -
> org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(org.elasticsearch.search.internal.InternalScrollSearchRequest,
> org.elasticsearch.transport.TransportChannel) @bci=8, line=791 (Interpreted
> frame)
>  -
> org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(org.elasticsearch.transport.TransportRequest,
> org.elasticsearch.transport.TransportChannel) @bci=6, line=780 (Interpreted
> frame)
>  -
> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run()
> @bci=12, line=270 (Compiled frame)
>  -
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=95, line=1145 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=724 (Interpreted frame)
>
> The state varies between IN_JAVA an BLOCKED. I took two stack traces 10
> minutes apart and they were identical for the suspect threads.
>
> I assume this could be a very long running query, but I wonder if it isn't
> just stuck. Perhaps we are seeing this issue:
> http://stackoverflow.com/questions/17070184/hashmap-stuck-on-get
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/78b9300c-02c3-4aba-be0e-98d92b97ee7d%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/78b9300c-02c3-4aba-be0e-98d92b97ee7d%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YxRt0Sh8f8O0dvfKoWPz4RsDTSk-WnkR7XAdgo-%2BzGSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to