Hi all

We're running a three node elasticsearch cluster (two data nodes, one 
dataless) and using it to store data from logstash.

Every week or two, we see the following message in the elasticsearch logs:

[24.8gb]->[24.5gb]/[24.8gb], all_pools {[young] 
[865.3mb]->[586mb]/[865.3mb]}{[survivor] [102.5mb]->[0b]/[108.1mb]}{[old] 
[23.9gb]->[23.9gb]/[23.9gb]}
[2014-11-17 15:26:15,066][WARN ][monitor.jvm              ] [es-prod-2] 
[gc][old][1189982][81480] duration [14.9s], collections [1]/[15.7s], total 
[14.9s]/[16.1h], memory 
[24.5gb]->[24.5gb]/[24.8gb], all_pools {[young] 
[586mb]->[592.5mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] 
[23.9gb]->[23.9gb]/[23.9gb]}
[2014-11-17 15:26:30,715][WARN ][monitor.jvm              ] [es-prod-2] 
[gc][old][1189983][81481] duration [14.6s], collections [1]/[15.6s], total 
[14.6s]/[16.1h], memory 
[24.5gb]->[24.5gb]/[24.8gb], all_pools {[young] 
[592.5mb]->[589.1mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] 
[23.9gb]->[23.9gb]/[23.9gb]}
[2014-11-17 15:26:46,705][WARN ][monitor.jvm              ] [es-prod-2] 
[gc][old][1189984][81482] duration [15.2s], collections [1]/[15.9s], total 
[15.2s]/[16.1h], memory 
[24.5gb]->[24.3gb]/[24.8gb], all_pools {[young] 
[589.1mb]->[445.2mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] 
[23.9gb]->[23.9gb]/[23.9gb]}
[2014-11-17 15:27:03,630][WARN ][monitor.jvm              ] [es-prod-2] 
[gc][old][1189986][81483] duration [15.8s], collections [1]/[15.9s], total 
[15.8s]/[16.1h], memory 
[24.8gb]->[24.3gb]/[24.8gb], all_pools {[young] 
[865.3mb]->[461.7mb]/[865.3mb]}{[survivor] [91.8mb]->[0b]/[108.1mb]}{[old] 
[23.9gb]->[23.9gb]/[23.9gb]}

When this occurs, search performance becomes very slow. Even a simple `$ 
curl http://es-prod-2:9200` can take around ten seconds.

The daily indexes created by logstash vary between 5M and 80M documents, 
and 1.5GiB and 25GiB on disk. The data nodes have ES_HEAP_SIZE=25G (we saw 
OOM errors with 15G and going over 30GiB is not recommended I believe).

I suspect this occurs when users try to query over a large numbers of 
indexes in Kibana.

My questions are:

1: How should I tune our cluster to handle these queries? Is our dataset 
simply too big?

2: When this happens, I restart the bad node by:

curl -XPUT "http://$HOST:$PORT/_cluster/settings?pretty"; -d ' {
    "transient": {
        "cluster.routing.allocation.enable": "none"
    }
}'

curl -XPUT "http://$HOST:$PORT/_cluster/settings?pretty"; -d '{
        "transient" : {
            "cluster.routing.allocation.enable" : "none"
        }
}'

(start the node again)

curl -XPUT "http://$HOST:$PORT/_cluster/settings?pretty"; -d ' {
    "transient": {
        "cluster.routing.allocation.enable": "all"
    }
}'

It's then an hour or two before the cluster is green again, as the shards 
are assigned and then initialized. Is this the best way to restart a bad 
node?

3: Can I remove the ability for users to make such intensive requests from 
Kibana (either a Kibana setting or an ES setting)?

Thanks
Wilfred

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/895d2c10-e64e-432b-9c3b-f285945d951d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to