If your largest index is only 1GB, then try reducing that to a single shard (with one replica) and see how the performance goes. I'd trial that on a few indices before rolling it out across the board though.
Again, the core issue is that you have too many shards! On 7 May 2015 at 00:34, Alejandro De Lío <alejandro.de...@gmail.com> wrote: > Hey guys, thanks for the answers. > > Firstly, the index size can vary from index to index (as each index is > associated with a particular business sector, each of wich might be more or > less active). The max size is approximately 1 GB for the heaviest indexes. > > Secondly, what do you think it would a sane ratio between shards and nodes > (we have 5 primary shards per index)? > > Background: > - the storage nodes have 24 GB RAM and 8 cores with 500 GB attached disks > (x 12 nodes) > - we generate around 4.000 shards per day (around 400 indexes with default > configuration of 5 primary shard per index and 1 replica per shard) > - one index contains only one document type > > The reason we've chosen this "many indexes design" was, fundamentally, > heterogeneous TTLs. We thought it would be better to drop a bunch of small > indexes rather than scan a whole giant index (around 3GB per shard), then > remove specific events and finally compact it. > > Do you think it might be better to have just a few indexes and perform > sporadic cleanup tasks, over partitioning information into lots of > independent, small, flexible indexes? It should be taken into account that > some of the current indexes have update-mapping operations frequently > applied to it, as the document structure might have new fields. > > Thanks for your time! > Ale > > > > El lunes, 4 de mayo de 2015, 22:01:46 (UTC-3), Mark Walkom escribió: >> >> The rationale of queuing is to allow for instances where temporary load >> on the cluster might otherwise reject a request. >> There is no way to prioritise tasks over other tasks. >> >> Though it looks like your problem is you are overloading your nodes. >> 32192 primary shards is a massive amount for only 12 nodes, you really need >> to reduce this pretty dramatically to alleviate the pressure. >> >> On 5 May 2015 at 07:05, <tomas...@despegar.com> wrote: >> >>> Hi all, >>> >>> We've been facing some trouble with our Elastic Search installation (v >>> 1.5.1), mostly trying >>> to bring it back up. Some questions have come up. >>> >>> This is our situation. We're seeing about 200 unassigned shards, which >>> are not being >>> reassigned automatically, which in turn leads our ES to move into a red >>> status. In this state, >>> queries are disabled, however we keep storing data. >>> >>> What we see is that this generates refresh/update mapping tasks which >>> are never resolved (we think >>> this is because ES is in a red state). Hoping to solve this, we've been >>> running _cluster/reroute on primary >>> shards: >>> >>> curl -XPOST 'localhost:9290/_cluster/reroute' -d '{ >>> "commands": [ >>> { >>> "allocate": { >>> "index": "'$INDEX'", >>> "shard": '$SHARD', >>> "node": "'$NODE'", >>> "allow_primary": true >>> } >>> } >>> ] >>> }' >>> >>> Commands such as _nodes/_local/{stats, settings, os, processes} will >>> fail on the master node (hang indefinetly) >>> >>> We monitor the pending tasks (see _cat/pending_tasks and >>> _cluster/health output below) and see that >>> the IMMEDIATE tasks are queued. >>> >>> However, we're wondering what's the rationale behind the queuing of this >>> tasks: >>> >>> Is there a round robin mechanism for the IMMEDIATE or is there some way >>> of prioritizing the >>> health state of the cluster over other tasks? Will an IMMEDIATE task >>> preempt any other (e.g URGENT or HIGH?) >>> >>> We've noticed that when queuing two IMMEDIATE tasks, the second one may >>> timeout if the first one >>> is not resolved fast enough. Is this queue being consumed by a single >>> thread? If so, anyway to change that? >>> >>> >>> Thanks in advance! >>> Tomás >>> --- >>> >>> _cat/pending_tasks >>> % Total % Received % Xferd Average Speed Time Time Time >>> Current >>> Dload Upload Total Spent Left >>> Speed >>> 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- >>> 0220771 2.1m IMMEDIATE cluster_reroute (api) >>> >>> 220891 29.8s IMMEDIATE cluster_reroute (api) >>> >>> 196772 10.1h HIGH update-mapping >>> [beer.raw.cl.business.2015-05-04][GDS_search_scans] / node >>> [v5SxZ7CdRou13tzy-N1DJg], order [1109] >>> 220892 25.9s IMMEDIATE cluster_reroute (api) >>> >>> 196773 10.1h HIGH update-mapping >>> [beer.raw.pe.business.2015-05-04][GDS_search_scans] / node >>> [BTYaSC3cT8K_3xHDQYoNXQ], order [419] >>> 196779 10.1h HIGH update-mapping >>> [beer.raw.ar.business.2015-05-04][prism_retrieve] / node >>> [iDVZlJycRdeOa1PGB4Oi9Q], order [127] >>> 220893 25.7s IMMEDIATE cluster_reroute (api) >>> >>> 196787 10.1h HIGH refresh-mapping >>> [beer.raw.pt.business.2015-05-03][[GDS_search_scans]] >>> >>> 196786 10.1h HIGH refresh-mapping >>> [beer.raw.ca.business.2015-05-03][[GDS_search_scans]] >>> >>> 196774 10.1h HIGH update-mapping >>> [beer.raw.pe.business.2015-05-04][GDS_search_scans] / node >>> [Kx-HMg4qQKqepJb1qjjS3A], order [151] >>> 196790 10.1h HIGH refresh-mapping >>> [beer.raw.ae.search.2015-05-03][[vito]] >>> >>> 196792 10.1h HIGH refresh-mapping >>> [beer.raw.tr.business.2015-05-03][[GDS_search_scans]] >>> >>> 218944 35.5m URGENT shard-started >>> ([beer.raw.gy.performance.2015-04-07][2], node[BTYaSC3cT8K_3xHDQYoNXQ], >>> relocating [0clH8MU6Q5Wt8phPRbuTLg], [P], s[INITIALIZING]), reason [after >>> recovery (replica) from node >>> [[beer-elastic-s-08][0clH8MU6Q5Wt8phPRbuTLg][beer-elastic-s-08][inet[/ >>> 10.70.163.240:9300]]{master=false}]] >>> >>> 220894 25.7s IMMEDIATE cluster_reroute (api) >>> >>> 196850 10.1h HIGH refresh-mapping >>> [beer.raw.fi.business.2015-05-03][[GDS_search_scans]] >>> >>> 196788 10.1h HIGH refresh-mapping >>> [beer.raw.il.business.2015-05-03][[GDS_search_scans]] >>> >>> 196789 10.1h HIGH refresh-mapping >>> [beer.raw.nl.business.2015-05-03][[GDS_search_scans]] >>> >>> >>> >>> >>> Health >>> >>> _cluster/health?pretty=true" >>> >>> "cluster_name" : "beer-elastic", >>> "status" : "red", >>> "timed_out" : false, >>> "number_of_nodes" : 20, >>> "number_of_data_nodes" : 12, >>> "active_primary_shards" : 32192, >>> "active_shards" : 64384, >>> "relocating_shards" : 2, >>> "initializing_shards" : 14, >>> "unassigned_shards" : 182, >>> "number_of_pending_tasks" : 13686 >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/e3f4f449-d359-4ef1-a6a4-a445bb916371%40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/e3f4f449-d359-4ef1-a6a4-a445bb916371%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > Please update your bookmarks! We moved to https://discuss.elastic.co/ > --- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/59371740-cc04-436d-8e78-4daa6822a83b%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/59371740-cc04-436d-8e78-4daa6822a83b%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Please update your bookmarks! We have moved to https://discuss.elastic.co/ --- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8dTdx6n8RfqMwgUX__%2B1_3FnMru%3DCGF4SPREy-FbGiFg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.