Thanks Mike! It's great to have other eyes on it (and I'm taking a bit
of a break to come back at it with fresh eyes).

It'll be a bit before I can respond in detail. So far the latest patch
has successfully run through one full test iteration which is totally
inadequate before checking in of course. I intend to send it through a
bunch more before thinking about committing of course, but any failed
cases most welcome as I can beast them.

Again, thanks for taking the time.

On Tue, Apr 24, 2018 at 8:48 AM, Michael McCandless (JIRA)
<j...@apache.org> wrote:
>
>     [ 
> https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450110#comment-16450110
>  ]
>
> Michael McCandless commented on LUCENE-7976:
> --------------------------------------------
>
> {quote}    // We did our best to find the right merges, but through the 
> vagaries of the scoring algorithm etc. we didn't                              
>                                                            // merge down to 
> the required max segment count. So merge the N smallest segments to make it 
> so.
> {quote}
> Hmm can you describe why this would happen?  Seems like if you ask the 
> scoring algorithm to find merges down to N segments, it shouldn't ever fail?
>
> We also seem to invoke {{getSegmentSizes}} more than once in 
> {{findForcedMerges}}?
>
>> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges 
>> of very large segments
>> -------------------------------------------------------------------------------------------------
>>
>>                 Key: LUCENE-7976
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-7976
>>             Project: Lucene - Core
>>          Issue Type: Improvement
>>            Reporter: Erick Erickson
>>            Assignee: Erick Erickson
>>            Priority: Major
>>         Attachments: LUCENE-7976.patch, LUCENE-7976.patch, 
>> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch
>>
>>
>> We're seeing situations "in the wild" where there are very large indexes (on 
>> disk) handled quite easily in a single Lucene index. This is particularly 
>> true as features like docValues move data into MMapDirectory space. The 
>> current TMP algorithm allows on the order of 50% deleted documents as per a 
>> dev list conversation with Mike McCandless (and his blog here:  
>> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
>> Especially in the current era of very large indexes in aggregate, (think 
>> many TB) solutions like "you need to distribute your collection over more 
>> shards" become very costly. Additionally, the tempting "optimize" button 
>> exacerbates the issue since once you form, say, a 100G segment (by 
>> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
>> docs in it are deleted (current default 5G max segment size).
>> The proposal here would be to add a new parameter to TMP, something like 
>> <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, 
>> suggestions welcome) which would default to 100 (or the same behavior we 
>> have now).
>> So if I set this parameter to, say, 20%, and the max segment size stays at 
>> 5G, the following would happen when segments were selected for merging:
>> > any segment with > 20% deleted documents would be merged or rewritten NO 
>> > MATTER HOW LARGE. There are two cases,
>> >> the segment has < 5G "live" docs. In that case it would be merged with 
>> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
>> >> segments exist, it would just be rewritten
>> >> The segment has > 5G "live" docs (the result of a forceMerge or 
>> >> optimize). It would be rewritten into a single segment removing all 
>> >> deleted docs no matter how big it is to start. The 100G example above 
>> >> would be rewritten to an 80G segment for instance.
>> Of course this would lead to potentially much more I/O which is why the 
>> default would be the same behavior we see now. As it stands now, though, 
>> there's no way to recover from an optimize/forceMerge except to re-index 
>> from scratch. We routinely see 200G-300G Lucene indexes at this point "in 
>> the wild" with 10s of  shards replicated 3 or more times. And that doesn't 
>> even include having these over HDFS.
>> Alternatives welcome! Something like the above seems minimally invasive. A 
>> new merge policy is certainly an alternative.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to