Re: Troubleshooting ES Resharding. Nature of immediate tasks and other questions

Mark Walkom Mon, 11 May 2015 16:38:46 -0700

If your largest index is only 1GB, then try reducing that to a single shard
(with one replica) and see how the performance goes.
I'd trial that on a few indices before rolling it out across the board
though.


Again, the core issue is that you have too many shards!


On 7 May 2015 at 00:34, Alejandro De Lío <alejandro.de...@gmail.com> wrote:

> Hey guys, thanks for the answers.
>
> Firstly, the index size can vary from index to index (as each index is
> associated with a particular business sector, each of wich might be more or
> less active). The max size is approximately 1 GB for the heaviest indexes.
>
> Secondly, what do you think it would a sane ratio between shards and nodes
> (we have 5 primary shards per index)?
>
> Background:
> - the storage nodes have 24 GB RAM and 8 cores with 500 GB attached disks
> (x 12 nodes)
> - we generate around 4.000 shards per day (around 400 indexes with default
> configuration of 5 primary shard per index and 1 replica per shard)
> - one index contains only one document type
>
> The reason we've chosen this "many indexes design" was, fundamentally,
> heterogeneous TTLs. We thought it would be better to drop a bunch of small
> indexes rather than scan a whole giant index (around 3GB per shard), then
> remove specific events and finally compact it.
>
> Do you think it might be better to have just a few indexes and perform
> sporadic cleanup tasks, over partitioning information into lots of
> independent, small, flexible indexes? It should be taken into account that
> some of the current indexes have update-mapping operations frequently
> applied to it, as the document structure might have new fields.
>
> Thanks for your time!
> Ale
>
>
>
> El lunes, 4 de mayo de 2015, 22:01:46 (UTC-3), Mark Walkom escribió:
>>
>> The rationale of queuing is to allow for instances where temporary load
>> on the cluster might otherwise reject a request.
>> There is no way to prioritise tasks over other tasks.
>>
>> Though it looks like your problem is you are overloading your nodes.
>> 32192 primary shards is a massive amount for only 12 nodes, you really need
>> to reduce this pretty dramatically to alleviate the pressure.
>>
>> On 5 May 2015 at 07:05, <tomas...@despegar.com> wrote:
>>
>>> Hi all,
>>>
>>> We've been facing some trouble with our Elastic Search installation (v
>>> 1.5.1), mostly trying
>>> to bring it back up. Some questions have come up.
>>>
>>> This is our situation. We're seeing about 200 unassigned shards, which
>>> are not being
>>> reassigned automatically, which in turn leads our ES to move into a red
>>> status. In this state,
>>> queries are disabled, however we keep storing data.
>>>
>>> What we see is that this generates refresh/update mapping tasks which
>>> are never resolved (we think
>>> this is because ES is in a red state). Hoping to solve this, we've been
>>> running _cluster/reroute on primary
>>> shards:
>>>
>>> curl -XPOST 'localhost:9290/_cluster/reroute' -d '{
>>>      "commands": [
>>>         {
>>>             "allocate": {
>>>                 "index": "'$INDEX'",
>>>                 "shard": '$SHARD',
>>>                 "node": "'$NODE'",
>>>                 "allow_primary": true
>>>           }
>>>         }
>>>     ]
>>>   }'
>>>
>>> Commands such as _nodes/_local/{stats, settings, os, processes} will
>>> fail on the master node (hang indefinetly)
>>>
>>> We monitor the pending tasks (see _cat/pending_tasks  and
>>> _cluster/health output below) and see that
>>> the IMMEDIATE tasks are queued.
>>>
>>> However, we're wondering what's the rationale behind the queuing of this
>>> tasks:
>>>
>>> Is there a round robin mechanism for the IMMEDIATE or is there some way
>>> of prioritizing the
>>> health state of the cluster over other tasks? Will an IMMEDIATE task
>>> preempt any other (e.g URGENT or HIGH?)
>>>
>>> We've noticed that when queuing two IMMEDIATE tasks, the second one may
>>> timeout if the first one
>>> is not resolved fast enough. Is this queue being consumed by a single
>>> thread? If so, anyway to change that?
>>>
>>>
>>> Thanks in advance!
>>> Tomás
>>> ---
>>>
>>> _cat/pending_tasks
>>>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
>>>  Current
>>>                                  Dload  Upload   Total   Spent    Left
>>>  Speed
>>>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>>>     0220771  2.1m IMMEDIATE cluster_reroute (api)
>>>
>>> 220891 29.8s IMMEDIATE cluster_reroute (api)
>>>
>>> 196772 10.1h HIGH      update-mapping
>>> [beer.raw.cl.business.2015-05-04][GDS_search_scans] / node
>>> [v5SxZ7CdRou13tzy-N1DJg], order [1109]
>>> 220892 25.9s IMMEDIATE cluster_reroute (api)
>>>
>>> 196773 10.1h HIGH      update-mapping
>>> [beer.raw.pe.business.2015-05-04][GDS_search_scans] / node
>>> [BTYaSC3cT8K_3xHDQYoNXQ], order [419]
>>> 196779 10.1h HIGH      update-mapping
>>> [beer.raw.ar.business.2015-05-04][prism_retrieve] / node
>>> [iDVZlJycRdeOa1PGB4Oi9Q], order [127]
>>> 220893 25.7s IMMEDIATE cluster_reroute (api)
>>>
>>> 196787 10.1h HIGH      refresh-mapping
>>> [beer.raw.pt.business.2015-05-03][[GDS_search_scans]]
>>>
>>> 196786 10.1h HIGH      refresh-mapping
>>> [beer.raw.ca.business.2015-05-03][[GDS_search_scans]]
>>>
>>> 196774 10.1h HIGH      update-mapping
>>> [beer.raw.pe.business.2015-05-04][GDS_search_scans] / node
>>> [Kx-HMg4qQKqepJb1qjjS3A], order [151]
>>> 196790 10.1h HIGH      refresh-mapping
>>> [beer.raw.ae.search.2015-05-03][[vito]]
>>>
>>> 196792 10.1h HIGH      refresh-mapping
>>> [beer.raw.tr.business.2015-05-03][[GDS_search_scans]]
>>>
>>> 218944 35.5m URGENT    shard-started
>>> ([beer.raw.gy.performance.2015-04-07][2], node[BTYaSC3cT8K_3xHDQYoNXQ],
>>> relocating [0clH8MU6Q5Wt8phPRbuTLg], [P], s[INITIALIZING]), reason [after
>>> recovery (replica) from node
>>> [[beer-elastic-s-08][0clH8MU6Q5Wt8phPRbuTLg][beer-elastic-s-08][inet[/
>>> 10.70.163.240:9300]]{master=false}]]
>>>
>>> 220894 25.7s IMMEDIATE cluster_reroute (api)
>>>
>>> 196850 10.1h HIGH      refresh-mapping
>>> [beer.raw.fi.business.2015-05-03][[GDS_search_scans]]
>>>
>>> 196788 10.1h HIGH      refresh-mapping
>>> [beer.raw.il.business.2015-05-03][[GDS_search_scans]]
>>>
>>> 196789 10.1h HIGH      refresh-mapping
>>> [beer.raw.nl.business.2015-05-03][[GDS_search_scans]]
>>>
>>>
>>>
>>>
>>> Health
>>>
>>> _cluster/health?pretty=true"
>>>
>>>   "cluster_name" : "beer-elastic",
>>>   "status" : "red",
>>>   "timed_out" : false,
>>>   "number_of_nodes" : 20,
>>>   "number_of_data_nodes" : 12,
>>>   "active_primary_shards" : 32192,
>>>   "active_shards" : 64384,
>>>   "relocating_shards" : 2,
>>>   "initializing_shards" : 14,
>>>   "unassigned_shards" : 182,
>>>   "number_of_pending_tasks" : 13686
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/e3f4f449-d359-4ef1-a6a4-a445bb916371%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/e3f4f449-d359-4ef1-a6a4-a445bb916371%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> Please update your bookmarks! We moved to https://discuss.elastic.co/
> ---
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/59371740-cc04-436d-8e78-4daa6822a83b%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/59371740-cc04-436d-8e78-4daa6822a83b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8dTdx6n8RfqMwgUX__%2B1_3FnMru%3DCGF4SPREy-FbGiFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Troubleshooting ES Resharding. Nature of immediate tasks and other questions

Reply via email to