OK I ran a quick test using Wikipedia docs; net/net I think
TieredMergePolicy's (the default) behavior is fine.  Once a too-large
segment has > 50% deletes it is eligible for merging and will be
aggressively merged.

To visualize this, I first built a 33.3M doc Wikipedia index (append
only), then ran forever randomly replacing each doc, which is a worst
case test since every update also deletes a previous doc.

I set max merged segment size to 800 MB, so I had a good number (17)
of them; otherwise I left TMP at defaults.

I refreshed every 3 seconds, and plotted the resulting graph of %tg
deleted but not yet merge docs over time:



It quickly ramps up from 0 at the start and only falls again once
the too-large segments start being merged and eventually stabilizes
to a fairly narrow range of 33%-45%.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Dec 4, 2014 at 5:30 AM, Michael McCandless <m...@elasticsearch.com>
wrote:

> 25-40% is definitely "normal" for an index where many docs are being
> replaced; I've seen this go up to ~65% before large merges bring it back
> down.
>
> On 2) there may be some improvements we can make to Lucene default
> TieredMergePolicy here, to reclaim deletes for the "too large" segments ...
> I'll have a look.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Dec 4, 2014 at 4:06 AM, Michal Taborsky <michal.tabor...@gmail.com
> > wrote:
>
>> Hello Nikolas,
>>
>> we are facing similar behavior. Did you find out anything?
>>
>> Thank you,
>> Michal
>>
>> Dne pondělí, 8. září 2014 22:55:12 UTC+2 Nikolas Everett napsal(a):
>>
>>> My indexes change somewhat frequently.  If I let leave the merge
>>> settings as the default I end up with 25%-40% deleted documents (some
>>> indexes higher, some lower).  I'm looking for some generic advice on:
>>> 1.  Is that 25%-40% ok?
>>> 2.  What kind of settings should I set to keep that in an acceptable
>>> range?  For some meaning of acceptable.
>>>
>>> On (1) I'm pretty sure 25%-40% is OK for my low query traffic indexes -
>>> no use optimizing them anyway.  But for my high search traffic indexes I
>>> _think_ I see a performance improvement when I have lower (<5%) deleted
>>> documents and fewer segments.  But computers are complicated and my
>>> performance tests might just have been testing cache warming....  Does this
>>> conclusion match other's experience?
>>>
>>> On (2) I'm not really sure what to do.  It _looks_ _like_ Lucene isn't
>>> picking up the bigger segments to merge the deletes out of them.  I assume
>>> that is because they are bumping against the max allowed segment size and
>>> therefor it can only merge one at a time so it always has something better
>>> to do.  I'm not sure that is healthy though.  Some of those old segments
>>> can get really bloated - like 40%-50% deleted.
>>>
>>> Thanks!
>>>
>>> Nik
>>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/faec06a2-c352-4e3e-bea0-41ace2b35d6f%40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/faec06a2-c352-4e3e-bea0-41ace2b35d6f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe_cN%2B2PtNT68z%2B5%3DDJ4W-vaO4-pUJ3bo1o0AFe%3D-4B1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to