I use this on 1.1.0 in my config/elasticsearch.yml
index:
merge:
scheduler:
type: concurrent
max_thread_count: 4
policy:
type: tiered
max_merged_segment: 1gb
segments_per_tier: 4
max_merge_at_once: 4
max_merge_at_once_explicit: 4
threadpool:
merge:
type: fixed
size: 4
queue_size: 32
Explanation:
- use concurrent scheduler and limit it to 4 threads. I find 4 threads
being able to keep up with the highest bulk insertion rate I could generate
- use tiered policy (the default, it is most flexible in selecting segments
to merge)
- create segments less than 1gb in a tier (this limits the file size of the
segments files, the smaller the files, the faster the merges, but the more
files are created)
- create 4 segments per tier (do not create segments numbers that are too
high per tier)
- merge 4 segments at each merge step (this limits the total run time and
resource consumption of a segment merge step)
- also limit merge for explicit _optimize API call
- extend thread pool to 4 merge threads with a maximum of 32 merge
operations in the queue (32 should be sufficient to handle outstanding
merges)
From time to time, if the number of files get very high (500) and index is
calm (no indexing, no heavy search), I do a manual _optimize.
Jörg
On Fri, Apr 18, 2014 at 9:01 PM, David Smith davidksmit...@gmail.comwrote:
I see that ES switch back to ConcurrentMergeScheduler in 1.1.1 due to it
affecting indexing performance in 1.1.0.
https://github.com/elasticsearch/elasticsearch/issues/5817
We're on 1.1.0 and cannot upgrade to 1.1.1 for the time being. Is there a
way to switch it back using the API? I tried the following command, but it
seems to not take.
curl -i -XPUT localhost:9200/_cluster/settings -d '{ persistent: {
index.merge.scheduler.type:
org.elasticsearch.index.merge.scheduler.ConcurrentMergeSchedulerProvider
} }'
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 52
{acknowledged:true,persistent:{},transient:{}}
It does not seem to be set when I try to re-GET it (and no errors in logs
at DEBUG level or above).
curl -i -XGET localhost:9200/_cluster/settings
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 66
{persistent:{threadpool:{bulk:{size:8}}},transient:{}}
Am using the wrong way of specifying the scheduler? I also tried just
specifying ConcurrentMergeSchedulerProvider instead of the full class
name, but that didn't work.
Any ideas?
David
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/601a831d-2c8e-4615-b816-435a6d4e4d9c%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/601a831d-2c8e-4615-b816-435a6d4e4d9c%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGwnPYyBPYRSPz5c9WGzfH68CHX7gXb7UwmgMbwXdOnMg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.