Re: MergePolicy Thresholds

Earwin Burrfoot Mon, 02 May 2011 07:23:45 -0700

Dunno, I'm quite happy with numLargeSegments (you critically
misspelled it). It neatly avoids uber-merges, keeps the number of
segments at bay, and does not require to recalculate thresholds when
my expected index size changes.


The problem is - each person needs his own set of knobs (or thinks he
needs them) for MergePolicy, and I can't call any of these sets
superior to others :/

2011/5/2 Shai Erera <ser...@gmail.com>:
> I did look at it, but I didn't find that it answers this particular need
> (ending with a segment no bigger than X). Perhaps by tweaking several
> parameters (e.g. maxLarge/SmallNumSegments + maxMergeSizeMB) I can achieve
> something, but it's not very clear what is the right combination.
>
> Which is related to one of the points -- is it not more intuitive for an app
> to set this threshold (if it needs any thresholds), than tweaking all of
> those parameters? If so, then we only need two thresholds (size +
> mergeFactor), and we can reuse BalancedMP's findBalancedMerges logic
> (perhaps w/ some adaptations) to derive a merge plan.
>
> Shai
>
> On Mon, May 2, 2011 at 4:42 PM, Earwin Burrfoot <ear...@gmail.com> wrote:
>>
>> Have you checked BalancedSegmentMergePolicy? It has some more knobs :)
>>
>> On Mon, May 2, 2011 at 17:03, Shai Erera <ser...@gmail.com> wrote:
>> > Hi
>> >
>> > Today, LogMP allows you to set different thresholds for segments sizes,
>> > thereby allowing you to control the largest segment that will be
>> > considered for merge + the largest segment your index will hold (=~
>> > threshold * mergeFactor).
>> >
>> > So, if you want to end up w/ say 20GB segments, you can set
>> > maxMergeMB(ForOptimize) to 2GB and mergeFactor=10.
>> >
>> > However, this often does not achieve your desired goal -- if the index
>> > contains 5 and 7 GB segments, they will never be merged b/c they are
>> > bigger than the threshold. I am willing to spend the CPU and IO
>> > resources
>> > to end up w/ 20 GB segments, whether I'm merging 10 segments together or
>> > only 2. After I reach a 20GB segment, it can rest peacefully, at least
>> > until I increase the threshold.
>> >
>> > So I wonder, first, if this threshold (i.e., largest segment size you
>> > would like to end up with) is more natural to set than thee current
>> > thresholds,
>> > from the application level? I.e., wouldn't it be a simpler threshold to
>> > set
>> > instead of doing weird calculus that depend on maxMergeMB(ForOptimize)
>> > and mergeFactor?
>> >
>> > Second, should this be an addition to LogMP, or a different
>> > type of MP. One that adheres to only those two factors (perhaps the
>> > segSize threshold should be allowed to set differently for optimize and
>> > regular merges). It can pick segments for merge such that it maximizes
>> > the result segment size (i.e., don't necessarily merge in sequential
>> > order), but not more than mergeFactor.
>> >
>> > I guess, if we think that maxResultSegmentSizeMB is more intuitive than
>> > the current thresholds, application-wise, then this change should go
>> > into LogMP. Otherwise, it feels like a different MP is needed, because
>> > LogMP is already complicated and another threshold would confuse things.
>> >
>> > What do you think of this? Am I trying to optimize too much? :)
>> >
>> > Shai
>> >
>> >
>>
>>
>>
>> --
>> Kirill Zakharenko/Кирилл Захаренко
>> E-Mail/Jabber: ear...@gmail.com
>> Phone: +7 (495) 683-567-4
>> ICQ: 104465785
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: MergePolicy Thresholds

Reply via email to