[
https://issues.apache.org/jira/browse/LUCENE-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-854:
--------------------------------------
Fix Version/s: (was: 2.2)
> Create merge policy that doesn't periodically inadvertently optimize
> --------------------------------------------------------------------
>
> Key: LUCENE-854
> URL: https://issues.apache.org/jira/browse/LUCENE-854
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.2
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
>
> The current merge policy, at every maxBufferedDocs *
> power-of-mergeFactor docs added, will do a fully cascaded merge, which
> is the same as an optimize.
> I think this is not good because at that "optimization poin", the
> particular addDocument call is [surprisingly] very expensive. While,
> amortized over all addDocument calls, the cost is low, the cost is
> paid "up front" and in a very "bunched up" manner.
> I think of this as "pay it forward": you are paying the full cost of
> an optimize right now on the expectation / hope that you will be
> adding a great many more docs. But, if you don't add that many more
> docs, then, the amortized cost for your index is in fact far higher
> than it should have been. Better to "pay as you go" instead.
> So we could make a small change to the policy by only merging the
> first mergeFactor segments once we hit 2X the merge factor. With
> mergeFactor=10, when we have created the 20th level 0 (just flushed)
> segment, we merge the first 10 into a level 1 segment. Then on
> creating another 10 level 0 segments, we merge the second set of 10
> level 0 segments into a level 1 segment, etc.
> With this new merge policy, an index that's a bit bigger than a
> current "optimization point" would then have a lower amortized cost
> per document. Plus the merge cost is less "bunched up" and less "pay
> it forward": instead you pay for what you are actually using.
> We can start by creating this merge policy (probably, combined with
> with the "by size not by doc count" segment level computation from
> LUCENE-845) and then later decide whether we should make it the
> default merge policy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]