[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510541#comment-16510541
 ] 

Erick Erickson commented on LUCENE-7976:
----------------------------------------

bq. If we fix the above loop to not add the singleton merge unless it has 
deletes?

I don't think so. Since this is common code, then it's quite possible that 
during forceMerge we assemble some segments that have no deletes due to 
maxMergeAtOnceExplicit that are still a fraction of maxMergedSegmentMB. These 
segments are eligible next pass to be merged even though they have no deleted 
documents. So we can't just omit them from the candidate up-front.

bq. ...rename maxMergeAtonce to maxMergeAtOnce

Done. Autocomplete strikes again, one misspelling and it propagates.

bq.  I.e. change true to bestTooLarge?

I've no objection, but what's the functional difference? Just making sure 
there's not a typo there.

bq. I think this logic is buggy?

The more I look the more I think it's _always_ been buggy. Or at least should 
be restructured.

Check me out on this. As far as I can tell, mergingBytes would be the exact 
same in the old code every time it was calculated. Every time through the loop 
for gathering the best merge, the code looks in the same infosSorted (which 
doesn't change) and starts from the same point every time (tooBigCount which 
doesn't change) and adds to mergingBytes if (and only if) the segment is in 
mergeContext.getMergingSegments() (which doesn't change).

mergingBytes really just asks if the sum of all the segments that could be 
merged are currently being merged total more than maxMergedSegmentBytes. So 
I'll make the new code do the same thing. I can calculate that value outside 
the loop and just set it once.

Or I'm missing something that'll be obvious when someone else points it out.

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7976
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> SOLR-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
> <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to