[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988458#action_12988458
 ] 

Michael McCandless commented on LUCENE-1076:
--------------------------------------------

bq. I saw this and thought it was interesting. Why is the gen needed?

So, at first I added it because the pushing of merged delete packets
got too hairy, eg when merges interleave you'd have to handle deletes
being pushed onto each other's internal merged segments.

Also, we really needed a transactional data structure here, because
before DW could push more deletes into an existing packet (ie the
packet was not write once), which made tracking problematic if the
merge wanted to record that the first batch of deletes had been
applied but not any subsequent pushes.

But, after making the change, I realized that today (trunk, 3.1) we
are badly inefficient!  We apply deletes to segments being merged, but
then we place the merged segment back in the same position.  This is
inefficient because later when this segment gets merged, we wastefully
re-apply the same deletes (plus, new ones, which do need to be
applied).  This is a total waste.

So, by decoupling tracking of where you are in the deletes packet
stream, from the physical location of your segment in the index, we
fix this waste.  Also, it's quite a bit simpler now -- we no longer
have to merge deletes on completing a merge.


> Allow MergePolicy to select non-contiguous merges
> -------------------------------------------------
>
>                 Key: LUCENE-1076
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1076
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-1076.patch, LUCENE-1076.patch
>
>
> I started work on this but with LUCENE-1044 I won't make much progress
> on it for a while, so I want to checkpoint my current state/patch.
> For backwards compatibility we must leave the default MergePolicy as
> selecting contiguous merges.  This is necessary because some
> applications rely on "temporal monotonicity" of doc IDs, which means
> even though merges can re-number documents, the renumbering will
> always reflect the order in which the documents were added to the
> index.
> Still, for those apps that do not rely on this, we should offer a
> MergePolicy that is free to select the best merges regardless of
> whether they are continuguous.  This requires fixing IndexWriter to
> accept such a merge, and, fixing LogMergePolicy to optionally allow
> it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to