[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459460#comment-16459460
 ] 

Erick Erickson commented on LUCENE-7976:
----------------------------------------

OK, I think this is getting quite close. The place I'm most uncomfortable is in 
findForcedMerges. See the TODO around line 778 plus the fact that there's a 
bunch of special handling depending on whether we're forceMerging to 1 segment, 
a max count or respecting max segments size. I want to make one more pass 
though it, there _ought_ to be a more organized way of doing this.

Also, when giving a maximum number of segments, we calculate the ideal segment 
size and then I increase it by 25% on the theory that segments won't fit 
perfectly, so allow some extra space in hopes that all the segments will be fit 
into the max segment count the first time through.

However, there are still edge cases I think where that won't necessarily work 
on the first pass, especially if there are very many segments. In that case, at 
the very end there's a loop essentially saying "go through as many iterations 
as necessary increasing the max segment size by 25% each time until you can fit 
them all in the required number of segments". This really means that in this 
case you could rewrite the entire index twice. Is that OK? I don't want to 
spend a lot of time on this case though, it seems to me that if you specify 
this you'll have to live with this edge case.

[~mikemccand] There's another departure from the old process here. If there are 
multiple passes for forceMerge, I keep returning null in until there aren't any 
current merges running involving the original segments. Is there any real point 
in trying to create another merge specification if there are merges from 
previous passes going on? This is around line 684.

I beasted all of Mikes failures 120 times along with TestTieredMergePolicy and 
no failures. All tests pass and precommit worked.

Then, of course I made one tiny change so I'll have to go 'round that testing 
again. I also have to make another couple of runs at counting the total bytes 
written to see if something crept in.

That said, I think this is the last major rearrangement I want to do. If my 
additional testing succeeds and there are no objections, I'll probably commit 
sometime this weekend.

Thanks to all who've looked at this!

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7976
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
> <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to