I use this on 1.1.0 in my config/elasticsearch.yml

index:
   merge:
     scheduler:
       type: concurrent
       max_thread_count: 4
     policy:
       type: tiered
       max_merged_segment: 1gb
       segments_per_tier: 4
       max_merge_at_once: 4
       max_merge_at_once_explicit: 4

threadpool:
  merge:
    type: fixed
    size: 4
    queue_size: 32


Explanation:

- use concurrent scheduler and limit it to 4 threads. I find 4 threads
being able to keep up with the highest bulk insertion rate I could generate
- use tiered policy (the default, it is most flexible in selecting segments
to merge)
- create segments less than 1gb in a tier (this limits the file size of the
segments files, the smaller the files, the faster the merges, but the more
files are created)
- create 4 segments per tier (do not create segments numbers that are too
high per tier)
- merge 4 segments at each merge step (this limits the total run time and
resource consumption of a segment merge step)
- also limit merge for explicit _optimize API call
- extend thread pool to 4 merge threads with a maximum of 32 merge
operations in the queue (32 should be sufficient to handle outstanding
merges)

>From time to time, if the number of files get very high (>500) and index is
calm (no indexing, no heavy search), I do a manual _optimize.

Jörg


On Fri, Apr 18, 2014 at 9:01 PM, David Smith <davidksmit...@gmail.com>wrote:

> I see that ES switch back to ConcurrentMergeScheduler in 1.1.1 due to it
> affecting indexing performance in 1.1.0.
> https://github.com/elasticsearch/elasticsearch/issues/5817
>
> We're on 1.1.0 and cannot upgrade to 1.1.1 for the time being. Is there a
> way to switch it back using the API? I tried the following command, but it
> seems to not take.
>
> curl -i -XPUT localhost:9200/_cluster/settings -d '{ "persistent": {
> "index.merge.scheduler.type":
> "org.elasticsearch.index.merge.scheduler.ConcurrentMergeSchedulerProvider"
> } }'
> HTTP/1.1 200 OK
> Content-Type: application/json; charset=UTF-8
> Content-Length: 52
>
> {"acknowledged":true,"persistent":{},"transient":{}}
>
>
> It does not seem to be set when I try to re-GET it (and no errors in logs
> at DEBUG level or above).
>
> curl -i -XGET localhost:9200/_cluster/settings
> HTTP/1.1 200 OK
> Content-Type: application/json; charset=UTF-8
> Content-Length: 66
>
> {"persistent":{"threadpool":{"bulk":{"size":"8"}}},"transient":{}}
>
>
> Am using the wrong way of specifying the scheduler? I also tried just
> specifying ConcurrentMergeSchedulerProvider instead of the full class
> name, but that didn't work.
>
> Any ideas?
> David
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/601a831d-2c8e-4615-b816-435a6d4e4d9c%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/601a831d-2c8e-4615-b816-435a6d4e4d9c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGwnPYyBPYRSPz5c9WGzfH68CHX7gXb7UwmgMbwXdOnMg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to