Re: Troubleshooting ES Resharding. Nature of immediate tasks and other questions

Mark Walkom Mon, 04 May 2015 18:02:04 -0700

The rationale of queuing is to allow for instances where temporary load on
the cluster might otherwise reject a request.
There is no way to prioritise tasks over other tasks.


Though it looks like your problem is you are overloading your nodes. 32192
primary shards is a massive amount for only 12 nodes, you really need to
reduce this pretty dramatically to alleviate the pressure.

On 5 May 2015 at 07:05, <[email protected]> wrote:

> Hi all,
>
> We've been facing some trouble with our Elastic Search installation (v
> 1.5.1), mostly trying
> to bring it back up. Some questions have come up.
>
> This is our situation. We're seeing about 200 unassigned shards, which are
> not being
> reassigned automatically, which in turn leads our ES to move into a red
> status. In this state,
> queries are disabled, however we keep storing data.
>
> What we see is that this generates refresh/update mapping tasks which are
> never resolved (we think
> this is because ES is in a red state). Hoping to solve this, we've been
> running _cluster/reroute on primary
> shards:
>
> curl -XPOST 'localhost:9290/_cluster/reroute' -d '{
>      "commands": [
>         {
>             "allocate": {
>                 "index": "'$INDEX'",
>                 "shard": '$SHARD',
>                 "node": "'$NODE'",
>                 "allow_primary": true
>           }
>         }
>     ]
>   }'
>
> Commands such as _nodes/_local/{stats, settings, os, processes} will fail
> on the master node (hang indefinetly)
>
> We monitor the pending tasks (see _cat/pending_tasks  and _cluster/health
> output below) and see that
> the IMMEDIATE tasks are queued.
>
> However, we're wondering what's the rationale behind the queuing of this
> tasks:
>
> Is there a round robin mechanism for the IMMEDIATE or is there some way of
> prioritizing the
> health state of the cluster over other tasks? Will an IMMEDIATE task
> preempt any other (e.g URGENT or HIGH?)
>
> We've noticed that when queuing two IMMEDIATE tasks, the second one may
> timeout if the first one
> is not resolved fast enough. Is this queue being consumed by a single
> thread? If so, anyway to change that?
>
>
> Thanks in advance!
> Tomás
> ---
>
> _cat/pending_tasks
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
>  Current
>                                  Dload  Upload   Total   Spent    Left
>  Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>   0220771  2.1m IMMEDIATE cluster_reroute (api)
>
> 220891 29.8s IMMEDIATE cluster_reroute (api)
>
> 196772 10.1h HIGH      update-mapping
> [beer.raw.cl.business.2015-05-04][GDS_search_scans] / node
> [v5SxZ7CdRou13tzy-N1DJg], order [1109]
> 220892 25.9s IMMEDIATE cluster_reroute (api)
>
> 196773 10.1h HIGH      update-mapping
> [beer.raw.pe.business.2015-05-04][GDS_search_scans] / node
> [BTYaSC3cT8K_3xHDQYoNXQ], order [419]
> 196779 10.1h HIGH      update-mapping
> [beer.raw.ar.business.2015-05-04][prism_retrieve] / node
> [iDVZlJycRdeOa1PGB4Oi9Q], order [127]
> 220893 25.7s IMMEDIATE cluster_reroute (api)
>
> 196787 10.1h HIGH      refresh-mapping
> [beer.raw.pt.business.2015-05-03][[GDS_search_scans]]
>
> 196786 10.1h HIGH      refresh-mapping
> [beer.raw.ca.business.2015-05-03][[GDS_search_scans]]
>
> 196774 10.1h HIGH      update-mapping
> [beer.raw.pe.business.2015-05-04][GDS_search_scans] / node
> [Kx-HMg4qQKqepJb1qjjS3A], order [151]
> 196790 10.1h HIGH      refresh-mapping
> [beer.raw.ae.search.2015-05-03][[vito]]
>
> 196792 10.1h HIGH      refresh-mapping
> [beer.raw.tr.business.2015-05-03][[GDS_search_scans]]
>
> 218944 35.5m URGENT    shard-started
> ([beer.raw.gy.performance.2015-04-07][2], node[BTYaSC3cT8K_3xHDQYoNXQ],
> relocating [0clH8MU6Q5Wt8phPRbuTLg], [P], s[INITIALIZING]), reason [after
> recovery (replica) from node
> [[beer-elastic-s-08][0clH8MU6Q5Wt8phPRbuTLg][beer-elastic-s-08][inet[/
> 10.70.163.240:9300]]{master=false}]]
>
> 220894 25.7s IMMEDIATE cluster_reroute (api)
>
> 196850 10.1h HIGH      refresh-mapping
> [beer.raw.fi.business.2015-05-03][[GDS_search_scans]]
>
> 196788 10.1h HIGH      refresh-mapping
> [beer.raw.il.business.2015-05-03][[GDS_search_scans]]
>
> 196789 10.1h HIGH      refresh-mapping
> [beer.raw.nl.business.2015-05-03][[GDS_search_scans]]
>
>
>
>
> Health
>
> _cluster/health?pretty=true"
>
>   "cluster_name" : "beer-elastic",
>   "status" : "red",
>   "timed_out" : false,
>   "number_of_nodes" : 20,
>   "number_of_data_nodes" : 12,
>   "active_primary_shards" : 32192,
>   "active_shards" : 64384,
>   "relocating_shards" : 2,
>   "initializing_shards" : 14,
>   "unassigned_shards" : 182,
>   "number_of_pending_tasks" : 13686
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e3f4f449-d359-4ef1-a6a4-a445bb916371%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/e3f4f449-d359-4ef1-a6a4-a445bb916371%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8eRJr6S%2B4k65tx6sPYMeDtiEuEhxJRicGtff35Gzz%3D%2BQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Troubleshooting ES Resharding. Nature of immediate tasks and other questions

Reply via email to