[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1076:
---------------------------------------

    Attachment: LUCENE-1076.patch

Patch.  I think it's ready to commit...

To stress test ooo merging, I made a fun new MockRandomMergePolicy
(swapped in half the time by LTC.newIndexWriterConfig) which randomly
decides when to do a merge, then randomly picks how many segments to
merge, and then randomly picks which ones.  Also, I modified
LogMergePolicy to add a boolean get/setRequireContiguousMerge,
defaulting to false.

Many tests rely on in-order docIDs during merging, so I had to wire
them to use in-order LogMP.

I also reworked how buffered deletes are managed, so that each
"packet" of buffered deletes, as well as each flushed segment, is now
assigned an incrementing gen.  This way, when it's time to apply
deletes, the algorithm is easy: only delete packets with gen >= this
segment should coalesce and apply.

Separately, eventually, I'd like to switch to a better default MP,
something like BSMP where immense merges are done w/ small mergeFactor
(eg, 2), and tiny merges are done w/ large mergeFactor.


> Allow MergePolicy to select non-contiguous merges
> -------------------------------------------------
>
>                 Key: LUCENE-1076
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1076
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-1076.patch, LUCENE-1076.patch
>
>
> I started work on this but with LUCENE-1044 I won't make much progress
> on it for a while, so I want to checkpoint my current state/patch.
> For backwards compatibility we must leave the default MergePolicy as
> selecting contiguous merges.  This is necessary because some
> applications rely on "temporal monotonicity" of doc IDs, which means
> even though merges can re-number documents, the renumbering will
> always reflect the order in which the documents were added to the
> index.
> Still, for those apps that do not rely on this, we should offer a
> MergePolicy that is free to select the best merges regardless of
> whether they are continuguous.  This requires fixing IndexWriter to
> accept such a merge, and, fixing LogMergePolicy to optionally allow
> it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to