[ 
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776543#comment-16776543
 ] 

Armin Braun edited comment on LUCENE-8688 at 2/25/19 6:20 AM:
--------------------------------------------------------------

Gave this a shot in the attached patch:
 * Basically brought back the old logic (pre LUCENE-7976) of simply collecting 
as many of the smallest segments as possible ("possible" now including the max 
segment size check).
 ** Made the tradeoff of merging the smallest remaining segments  to get to the 
requested segment count
 ** Technically speaking one could do better than the above trade-off (in some 
cases) by using a smarter bin-packing algorithm but the above comment described 
merging large segments close to bin-size with tiny segments as wasteful so I 
didn't try that
 * Added a new rough test that checks that we arrive at the exact max segment 
count and don't exceed max segment size significantly (copied first few lines 
from the above test and should probably clean this up a little if the approach 
is ok)

 ** It's much stricter than the existing test for this size-wise

[^LUCENE-8688.patch]


was (Author: original-brownbear):
Gave this a shot in the attached patch:
 * Basically brought back the old logic (pre LUCENE-7976) of simply collecting 
as many of the smallest segments as possible ("possible" now including the max 
segment size check).
 ** Made the tradeoff of merging the smallest remaining segments  to get to the 
requested segment count
 ** Technically speaking one could do better than the above trade-off (in some 
cases) by using a smarter bin-packing algorithm but the above comment described 
merging large segments close to bin-size with tiny segments as wasteful so I 
didn't try that
 * Added a new rough test that checks that we arrive at the exact max segment 
count and don't exceed max segment size significantly 
 ** It's much stricter than the existing test for this size-wise

[^LUCENE-8688.patch]

> Forced merges merge more than necessary
> ---------------------------------------
>
>                 Key: LUCENE-8688
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8688
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8688.patch
>
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes 
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would 
> result in an index that has maxSegmentCount segments at most. Now that we 
> share the same logic as regular merges, we are almost sure to pick a 
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that 
> have more segments usually score better. This is due to the fact that natural 
> merges assume that merges that run now save work for later, so the more 
> segments get merged, the better. This assumption doesn't hold for forced 
> merges that should run on read-only indices, so there won't be any future 
> merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to