[ 
https://issues.apache.org/jira/browse/LUCENE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709003#action_12709003
 ] 

John Wang commented on LUCENE-1634:
-----------------------------------

>> So let's proceed with this patch, once you've added setter/getter.
Can you please elaborate on this? Add setter/getter on what? The number of 
target segment is already an input parameter? do you mean some sort of 
normalization factor on the how much to "punish" segments with high deleted 
docs?

> LogMergePolicy should use the number of deleted docs when deciding which 
> segments to merge
> ------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1634
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1634
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Yasuhiro Matsuda
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1634.patch
>
>
> I found that IndexWriter.optimize(int) method does not pick up large segments 
> with a lot of deletes even when most of the docs are deleted. And the 
> existence of such segments affected the query performance significantly.
> I created an index with 1 million docs, then went over all docs and updated a 
> few thousand at a time.  I ran optimize(20) occasionally. What saw were large 
> segments with most of docs deleted. Although these segments did not have 
> valid docs they remained in the directory for a very long time until more 
> segments with comparable or bigger sizes were created.
> This is because LogMergePolicy.findMergeForOptimize uses the size of segments 
> but does not take the number of deleted documents into consideration when it 
> decides which segments to merge. So, a simple fix is to use the delete count 
> to calibrate the segment size. I can create a patch for this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to