[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490054#comment-16490054
 ] 

Erick Erickson commented on LUCENE-7976:
----------------------------------------

[~mikemccand] OK, I'm back at it.

------------easy parts.
I'll incorporate all your style suggestions, NP there. Except for 
twoMayHaveBeenMerged. That's not a typo, it's meant to cover the case where 
there _could_ be one tiny segment and findForcedMerges legitimately combines 
that small segment with a larger one even if there are no deletions while still 
respecting maxMergedSegmentSizeMB. I've added a bit more explanation in the 
comments though, if it confused you it would confuse others. Maybe I'll be able 
to think of a better name.

bq. Can you change doFindMerges to not modify the incoming list...
I've done that. Pending the resolution to maxSegments, holdInCase may not be 
necessary though in which case copying to a local variable is unnecessary. I'm 
not sure I care given how expensive merging is in general, one extra list copy 
is probably immeasurable (and cleaner).

------hard part 1
bq. These parts still seem smelly to me:
Yeah, me too.

bq. Or, maybe we can change the problem: when you forceMerge to N segments, 
should we really allow violating the maxMergedSegmentMB constraint in that case?

Frankly I'd like to have exactly one valid value for maxSegments: 1 . That is, 
the only choices would be to merge down to 1 segment or respect 
maxMergedSegmentMB. WDYT?

Another option is to relax what specifying maxSegments means to a "best 
effort". If we just calculate the max segment size based on the number of 
segments specified and not worry about meeting it exactly. Then the loop goes 
away. In that case, the 25% (or whatever) fudge factor isn't dangerous since 
there'd be no loop, but fudging it up isn't totally necessary either.

I do wonder who, if anybody, uses maxSegments except to avoid the issue that 
started this, that is optimize creating a large segment that can accumulate 
lots of deleted docs. Has it outlived its usefulness?


---- hard part 2
bq. And I think that's the line you were asking me about with this?
bq. But I think that's dangerous – won't this mean that we will fail to return 
a merge that could concurrently run with already merging segments, in the 
cascading case?

So the scenario here is that we have a long-running merge running on one thread 
and, even though we could be running a dozen other small merges we wouldn't run 
any of them until the big one was done, right?

OK, I'll put this back.

--------------- TBD then:
So I think the only thing remaining from your review is what to do about 
maxSegments and that really horrible loop. Options in the order I like ;)

1> Remove the loop entirely. Calculate the theoretical segment size and add 25% 
(or whatever) and call doFindMerges once. The purpose of the loop is to handle 
the case where that calculations results in N+M segments when N was specified 
but M additional segments were created because they didn't pack well (your bin 
packing characterization). Update the docs to say it's on a "best effort" basis.

This option preserves most of the intent of maxSegments without introducing the 
complexity/danger. And I think you're right, with enough segments we'd never 
get out of that loop.

2> Allow maxSegments to have only one option: 1. Otherwise we just respect 
maxMergedSegmentSizeMB. Users can approximate doing forceMerge with maxSegments 
with values other than 1 by configuring their maxMergedSegmentsSizeMB to suit 
their wishes and do a forceMerge without specifying maxSegments.

--------
So all told this is closer than I feared when I saw how long your response was 
;). If we decide to go with option <1> for maxSegments, all that would remain 
after that is putting back the bits with find merges even if there are merges 
currently running.



> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7976
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
> <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to