I think it should be an easy port... Mike
http://blog.mikemccandless.com On Mon, May 2, 2011 at 2:16 PM, Shai Erera <ser...@gmail.com> wrote: > Thanks Mike. I'll take a look at TieredMP. Does it depend on trunk in any > way, or do you think it can easily be ported to 3x? > Shai > > On Mon, May 2, 2011 at 6:34 PM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> >> Actually the new TieredMergePolicy (only on trunk currently but I plan >> to backport for 3.2) lets you set the max merged segment size >> (maxMergedSegmentMB). >> >> It's only an "estimate", but if it's set, it tries to pick a merge >> reaching around that target size. >> >> Mike >> >> http://blog.mikemccandless.com >> >> On Mon, May 2, 2011 at 9:03 AM, Shai Erera <ser...@gmail.com> wrote: >> > Hi >> > >> > Today, LogMP allows you to set different thresholds for segments sizes, >> > thereby allowing you to control the largest segment that will be >> > considered for merge + the largest segment your index will hold (=~ >> > threshold * mergeFactor). >> > >> > So, if you want to end up w/ say 20GB segments, you can set >> > maxMergeMB(ForOptimize) to 2GB and mergeFactor=10. >> > >> > However, this often does not achieve your desired goal -- if the index >> > contains 5 and 7 GB segments, they will never be merged b/c they are >> > bigger than the threshold. I am willing to spend the CPU and IO >> > resources >> > to end up w/ 20 GB segments, whether I'm merging 10 segments together or >> > only 2. After I reach a 20GB segment, it can rest peacefully, at least >> > until I increase the threshold. >> > >> > So I wonder, first, if this threshold (i.e., largest segment size you >> > would like to end up with) is more natural to set than thee current >> > thresholds, >> > from the application level? I.e., wouldn't it be a simpler threshold to >> > set >> > instead of doing weird calculus that depend on maxMergeMB(ForOptimize) >> > and mergeFactor? >> > >> > Second, should this be an addition to LogMP, or a different >> > type of MP. One that adheres to only those two factors (perhaps the >> > segSize threshold should be allowed to set differently for optimize and >> > regular merges). It can pick segments for merge such that it maximizes >> > the result segment size (i.e., don't necessarily merge in sequential >> > order), but not more than mergeFactor. >> > >> > I guess, if we think that maxResultSegmentSizeMB is more intuitive than >> > the current thresholds, application-wise, then this change should go >> > into LogMP. Otherwise, it feels like a different MP is needed, because >> > LogMP is already complicated and another threshold would confuse things. >> > >> > What do you think of this? Am I trying to optimize too much? :) >> > >> > Shai >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org