You should create indexes before bulk indexing. First, bulk indexing works
much better if all indices and their mappings are already present, the
operations will run faster and without conflicts, and the cluster state
updates are less frequent which reduces some noise and hiccups. Second,
setting the indices refresh rate to -1 and replica level to 0 while in bulk
indexing mode helps a lot for performance.

If you create 1000+ shards per node, you seem to exceed the limit of your
system. Do not expect admin operations like index creation work in O(1)
time, they are O(n/c)  with n = number of affected shards and c the
threadpool size for the operation (the total node number also counts but I
neglect it here). So yes, it is expected that index creation operations
take longer if they reach the limit of your nodes, but there can be plenty
of reasons for it (increasing shard count is just one of them). And it is
expected that you see the 30s cluster action timeout in theses cases, yes.

There is no strictly predictable resource limit for a node, all this
depends heavily on factors from outside of Elasticsearch (JVM, CPU, memory,
disk I/O, your workload of indexing/searching) so it is up to you to
calibrate your node capacity. After adding nodes, you will observe that ES
scales well and can handle more shards.

Jörg


On Tue, May 13, 2014 at 11:59 AM, Paul <codive...@gmail.com> wrote:

> We are seeing a slow down in shard initialization speed as the number of
> shards/indices grows in our cluster.
>
> With 0-100's of indices/shards existing in the cluster a new bulk creation
> of indices up the 100's at a time is fine, we see them pass through the
> states and get a green cluster in a reasonable amount of time.
>
> As the total cluster size grows to 1000+ indices (3000+ shards) we begin
> to notice that the first rounds of initialization take longer to process,
> it seems to speed up after the first few batches, but this slow down leads
> to "failed to process cluster event (create-index [index_1112], cause
> [auto(bulk api)]) within 30s" type messages in the Master logs - the
> indices are eventually created.
>
>
> Has anyone else experienced this? (did you find the cause / way to fix?)
>
> Is this somewhat expected behaviour? - are we approaching something
> incorrectly? (there are 3 data nodes involved, with 3 shards per index)
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHG8gXnPNje24sN7SzyskAYUrLEPpJpeZS9O5DZYgFdyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to