[ https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490054#comment-16490054 ]
Erick Erickson commented on LUCENE-7976: ---------------------------------------- [~mikemccand] OK, I'm back at it. ------------easy parts. I'll incorporate all your style suggestions, NP there. Except for twoMayHaveBeenMerged. That's not a typo, it's meant to cover the case where there _could_ be one tiny segment and findForcedMerges legitimately combines that small segment with a larger one even if there are no deletions while still respecting maxMergedSegmentSizeMB. I've added a bit more explanation in the comments though, if it confused you it would confuse others. Maybe I'll be able to think of a better name. bq. Can you change doFindMerges to not modify the incoming list... I've done that. Pending the resolution to maxSegments, holdInCase may not be necessary though in which case copying to a local variable is unnecessary. I'm not sure I care given how expensive merging is in general, one extra list copy is probably immeasurable (and cleaner). ------hard part 1 bq. These parts still seem smelly to me: Yeah, me too. bq. Or, maybe we can change the problem: when you forceMerge to N segments, should we really allow violating the maxMergedSegmentMB constraint in that case? Frankly I'd like to have exactly one valid value for maxSegments: 1 . That is, the only choices would be to merge down to 1 segment or respect maxMergedSegmentMB. WDYT? Another option is to relax what specifying maxSegments means to a "best effort". If we just calculate the max segment size based on the number of segments specified and not worry about meeting it exactly. Then the loop goes away. In that case, the 25% (or whatever) fudge factor isn't dangerous since there'd be no loop, but fudging it up isn't totally necessary either. I do wonder who, if anybody, uses maxSegments except to avoid the issue that started this, that is optimize creating a large segment that can accumulate lots of deleted docs. Has it outlived its usefulness? ---- hard part 2 bq. And I think that's the line you were asking me about with this? bq. But I think that's dangerous – won't this mean that we will fail to return a merge that could concurrently run with already merging segments, in the cascading case? So the scenario here is that we have a long-running merge running on one thread and, even though we could be running a dozen other small merges we wouldn't run any of them until the big one was done, right? OK, I'll put this back. --------------- TBD then: So I think the only thing remaining from your review is what to do about maxSegments and that really horrible loop. Options in the order I like ;) 1> Remove the loop entirely. Calculate the theoretical segment size and add 25% (or whatever) and call doFindMerges once. The purpose of the loop is to handle the case where that calculations results in N+M segments when N was specified but M additional segments were created because they didn't pack well (your bin packing characterization). Update the docs to say it's on a "best effort" basis. This option preserves most of the intent of maxSegments without introducing the complexity/danger. And I think you're right, with enough segments we'd never get out of that loop. 2> Allow maxSegments to have only one option: 1. Otherwise we just respect maxMergedSegmentSizeMB. Users can approximate doing forceMerge with maxSegments with values other than 1 by configuring their maxMergedSegmentsSizeMB to suit their wishes and do a forceMerge without specifying maxSegments. -------- So all told this is closer than I feared when I saw how long your response was ;). If we decide to go with option <1> for maxSegments, all that would remain after that is putting back the bits with find merges even if there are merges currently running. > Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of > very large segments > ------------------------------------------------------------------------------------------------- > > Key: LUCENE-7976 > URL: https://issues.apache.org/jira/browse/LUCENE-7976 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Erick Erickson > Assignee: Erick Erickson > Priority: Major > Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, > LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, > LUCENE-7976.patch > > > We're seeing situations "in the wild" where there are very large indexes (on > disk) handled quite easily in a single Lucene index. This is particularly > true as features like docValues move data into MMapDirectory space. The > current TMP algorithm allows on the order of 50% deleted documents as per a > dev list conversation with Mike McCandless (and his blog here: > https://www.elastic.co/blog/lucenes-handling-of-deleted-documents). > Especially in the current era of very large indexes in aggregate, (think many > TB) solutions like "you need to distribute your collection over more shards" > become very costly. Additionally, the tempting "optimize" button exacerbates > the issue since once you form, say, a 100G segment (by > optimizing/forceMerging) it is not eligible for merging until 97.5G of the > docs in it are deleted (current default 5G max segment size). > The proposal here would be to add a new parameter to TMP, something like > <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, suggestions > welcome) which would default to 100 (or the same behavior we have now). > So if I set this parameter to, say, 20%, and the max segment size stays at > 5G, the following would happen when segments were selected for merging: > > any segment with > 20% deleted documents would be merged or rewritten NO > > MATTER HOW LARGE. There are two cases, > >> the segment has < 5G "live" docs. In that case it would be merged with > >> smaller segments to bring the resulting segment up to 5G. If no smaller > >> segments exist, it would just be rewritten > >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). > >> It would be rewritten into a single segment removing all deleted docs no > >> matter how big it is to start. The 100G example above would be rewritten > >> to an 80G segment for instance. > Of course this would lead to potentially much more I/O which is why the > default would be the same behavior we see now. As it stands now, though, > there's no way to recover from an optimize/forceMerge except to re-index from > scratch. We routinely see 200G-300G Lucene indexes at this point "in the > wild" with 10s of shards replicated 3 or more times. And that doesn't even > include having these over HDFS. > Alternatives welcome! Something like the above seems minimally invasive. A > new merge policy is certainly an alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org