Also related https://github.com/elastic/elasticsearch/issues/10447
On 17 April 2015 at 12:37, Charlie Moad <charlie.m...@geofeedia.com> wrote: > This was tracked down to a problem with Ubuntu 14.04 running under Xen (in > AWS). The latest kernel in Ubuntu resolves the problem, so I had to do a > rolling "apt-get update; apt-get dist-upgrade; reboot" on all nodes. This > appears to have resolved the issue. > > For reference: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811 > > > On Thursday, April 16, 2015 at 11:20:06 AM UTC-4, Charlie Moad wrote: >> >> A few days ago we started to receive a lot of timeouts across our >> cluster. This is causing shard allocation to fail and a perpetual >> red/yellow state. >> >> Examples: >> [2015-04-16 15:04:50,970][DEBUG][action.admin.cluster.node.stats] >> [coordinator02] failed to execute on node [1rfWT-mXTZmF_NzR_h1IZw] >> org.elasticsearch.transport.ReceiveTimeoutTransportException: >> [search01][inet[ip-172-30-11-161.ec2.internal/172.30.11.161:9300]][cluster:monitor/nodes/stats[n]] >> request_id [3680727] timed out after [15001ms] >> at >> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> [2015-04-16 15:03:26,105][WARN ][gateway.local ] >> [coordinator02] [global.y2014m01d30.v2][0]: failed to list shard stores on >> node [1rfWT-mXTZmF_NzR_h1IZw] >> org.elasticsearch.action.FailedNodeException: Failed node >> [1rfWT-mXTZmF_NzR_h1IZw] >> at >> org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.onFailure(TransportNodesOperationAction.java:206) >> at >> org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$1000(TransportNodesOperationAction.java:97) >> >> at >> org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction$4.handleException(TransportNodesOperationAction.java:178) >> at >> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: >> [search01][inet[ip-172-30-11-161.ec2.internal/172.30.11.161:9300]][internal:cluster/nodes/indices/shard/store[n]] >> request_id [3677537] timed out after [30001ms] >> ... 4 more >> >> I believe I have tracked this down to the management thread pool being >> saturated on our data nodes and not responding to requests. Our cluster has >> 3 master nodes,no data and 3 worker nodes,no master. I increased the >> maximum pool size from 5 to 20 and the workers immediately jumped to 20. >> I'm still seeing the errors. >> >> host management.type management.active >> management.size management.queue management.queueSize management.rejected >> management.largest management.completed management.min management.max >> management.keepAlive >> coordinator01 scaling 1 >> 2 0 0 >> 2 37884 1 20 >> 5m >> search02 scaling 1 >> 20 0 0 >> 20 1945337 1 20 >> 5m >> search01 scaling 1 >> 20 0 0 >> 20 2034838 1 20 >> 5m >> search03 scaling 1 >> 20 0 0 >> 20 1862848 1 20 >> 5m >> coordinator03 scaling 1 >> 2 0 0 >> 2 37875 1 20 >> 5m >> coordinator02 scaling 2 >> 5 0 0 >> 5 44127 1 20 >> 5m >> >> How can I address this problem? >> >> Thanks, >> Charlie >> > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/c49d0468-2d02-49f7-8356-4b9865842eb0%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/c49d0468-2d02-49f7-8356-4b9865842eb0%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X93cFNkW8MK480%2BfkgNLDZhSdWJ1_--3Ra__ki%3Dh8G0ig%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.