[ 
https://issues.apache.org/jira/browse/LUCENE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708898#action_12708898
 ] 

Michael McCandless commented on LUCENE-1634:
--------------------------------------------

Actually, optimize() always merges all segments down to 1,
irrespective of deletes.  I think you're referring to "normal" merges?

I think this approach is reasonable and a good step forward.  Can you
make this behaviour get/settable?  I think we should default to the
old behaviour until 3.0, and then switch it to default to the new one
at 3.0 (to preserve back compat).

Thinking more about this... I think we can further improve how
Log*MergePolicy takes deletes into account.  Ie, why not explicitly
measure the deletions and bias merge selection to favor merging away
segments that have the most deletions?

This might require relaxing the merge policy so that it's allowed to
pick fewer than mergeFactor segments to merge at once; perhaps it's
given a min/max mergeFactor.

Likely such a change should be a new merge policy...


> LogMergePolicy should use the number of deleted docs when deciding which 
> segments to merge
> ------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1634
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1634
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Yasuhiro Matsuda
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1634.patch
>
>
> I found that IndexWriter.optimize(int) method does not pick up large segments 
> with a lot of deletes even when most of the docs are deleted. And the 
> existence of such segments affected the query performance significantly.
> I created an index with 1 million docs, then went over all docs and updated a 
> few thousand at a time.  I ran optimize(20) occasionally. What saw were large 
> segments with most of docs deleted. Although these segments did not have 
> valid docs they remained in the directory for a very long time until more 
> segments with comparable or bigger sizes were created.
> This is because LogMergePolicy.findMergeForOptimize uses the size of segments 
> but does not take the number of deleted documents into consideration when it 
> decides which segments to merge. So, a simple fix is to use the delete count 
> to calibrate the segment size. I can create a patch for this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to