[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

Michael McCandless (JIRA) Thu, 10 May 2018 12:50:20 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471009#comment-16471009
 ]


Michael McCandless commented on LUCENE-7976:
--------------------------------------------

Net/net I like this change; it would allow the forced merges to pick "better" 
merges to achieve their constraints.  But I think we do need to keep 
iterating... this is really a complex change.

Can you remove the {{@Test}} annotations?  They are redundant.

{{twoManyHaveBeenMerged}} --> {{tooManyHaveBeenMerged}}?

Can you name it {{maxDoc}} not {{maxDocs}}?  E.g. {{segMaxDocs}} -> 
{{segMaxDoc}}, just to be consistent w/ {{maxDoc}} elsewhere in our sources.

{noformat}
    int originalSortedSize = sortedEligible.size();
    if (verbose(writer)) {
      message("findMerges: " + originalSortedSize + " segments", writer);
    }
    if (sortedEligible.size() == 0) {
      return null;
    }
{noformat}

You can use {{originalSortedSize}} in the if above?

Should the {{haveOneLargeMerge}} also look at the merging segments to decide 
whether a large merge is already running before invoking TMP? Otherwise the 
very next time IW invokes TMP it could send out another large merge.

This new change worries me:

{noformat}
    // If we're already merging, let it all merge before doing more.
    if (merging.size() > 0) return null;
{noformat}

And I think that's the line you were asking me about with this?

bq. Michael McCandless There's another departure from the old process here. If 
there are multiple passes for forceMerge, I keep returning null in until there 
aren't any current merges running involving the original segments. Is there any 
real point in trying to create another merge specification if there are merges 
from previous passes going on? This is around line 684.

But I think that's dangerous -- won't this mean that we will fail to return a 
merge that could concurrently run with already merging segments, in the 
cascading case?

These parts still seem smelly to me:

{noformat}
     // Fudge this up a bit so we have a better chance of not having to rewrite 
segments. If we use the exact size,
     // it's almost guaranteed that the segments won't fit perfectly and we'll 
be left with more segments than
     // we want and have to re-merge in the code at the bottom of this method.
     maxMergeBytes = Math.max((long) (((double) totalMergeBytes / (double) 
maxSegmentCount)), maxMergedSegmentBytes);
     maxMergeBytes = (long)((double) maxMergeBytes * 1.25);
{noformat}

{noformat}
    // we're merging down to some specified number of segments> 1, but the 
scoring with the first guess at max
    // size of segments didn't get us to the required number. So we need to 
relax the size restrictions until they all
    // fit and trust the scoring to give us the cheapest merges..
    // TODO: Is this really worth it? It potentially rewrites the entire index 
_again_. Or should we consider
    // maxSegments a "best effort" kind of thing? Or do we just assume that any 
bad consequences of people doing this
    // is SEP (Someone Else's Problem)?
    
    if (maxSegmentCount != Integer.MAX_VALUE) {
      while (spec == null || spec.merges.size() > maxSegmentCount) {
        maxMergeBytes = (long)((double)maxMergeBytes * 1.25);
        sortedSizeAndDocs.clear();
        sortedSizeAndDocs.addAll(holdInCase);
        spec = doFindMerges(sortedSizeAndDocs, maxMergeBytes, 
maxMergeAtOnceExplicit,
            maxSegmentCount, MERGE_TYPE.FORCE_MERGE, writer);
      }
    }
{noformat}

The 25% up front fudge factor, the retrying in the end with larger and larger 
fudge factors ... seems risky; rewriting the index 2X times during 
{{forceMerge}} is no good.  I wonder if there's a better way; it's sort of a 
bin packing problem, where you need to distribute existing segments into N bins 
where the bins are close to the same size?

Or, maybe we can change the problem: when you forceMerge to N segments, should 
we really allow violating the {{maxMergedSegmentMB}} constraint in that case?

I think that retry loop can run forever, if you have an index that has so many 
segments that in order to force merge down to the requested count, the merges 
must cascade?  Maybe make a test?  E.g. if you want to force merge to 5 
segments, and the index has more than 6 * 30 (default 
{{maxMergeAtOnceExplicit}}) then it may spin forever?

Can you change {{doFindMerges}} to not modify the incoming list (i.e. make its 
own copy), then you don't need to {{holdInCase}} local variable?

Can you use an ordinary {{if}} here instead of ternary operator?  And invert 
the {{if}} to {{if (mergeType == MERGE_TYPE.NATURAL)}}?  And then move the {{// 
if forcing}} comment inside the else clause?

{noformat}
        int lim = (mergeType != MERGE_TYPE.NATURAL) ? sortedEligible.size() - 1 
: sortedEligible.size() - maxMergeAtonce;
{noformat}


> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7976
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
> <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

Reply via email to