In the brief test I did indexing 500K documents and optimizing every 10K documents, I found that indexing is constant time (flat) and optimize() time increases linearly.
-Sean Grant Ingersoll wrote on 4/18/2007, 4:29 PM: > Has anyone done in benchmarking to approximate how long it takes to > optimize different size indexes? Is the merging linear, sub-linear, > etc.? > > On Apr 8, 2007, at 1:01 AM, Otis Gospodnetic wrote: > > > I'd advise against calling optimize() at all in an environment > > whose indices are constantly updated. That's what mergeFactor > > helps with. Keep it low, and Lucene itself will regularly merge > > segments more often. If one still wants to call optimize(), you'd > > want to know how long it would take on with the index of your size > > and if you've got enough lull time, do it, otherwise postpone it. > > > > Otis > > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . > > Simpy -- http://www.simpy.com/ - Tag - Search - Share > > > > ----- Original Message ---- > > From: Grant Ingersoll <[EMAIL PROTECTED]> > > To: java-dev@lucene.apache.org > > Sent: Friday, April 6, 2007 6:53:13 PM > > Subject: optimize() method call > > > > I was looking at the javadocs for the optimize() call on IndexWriter > > which contain a great amount of detail about what happens, but very > > little guidance on when. I would like to add more on when. I > > generally do optimize after I finish my indexing, which is pretty > > straightforward to determine when one has a more or less static > > collection. What isn't so clear to me, b/c I haven't dealt w/ it too > > much is when optimize should be called in environments that are > > frequently updated. > > > > Here's what I have for text so far: > > * > > * <p>It is recommended that this method be called upon completion > > of indexing. In > > * environments with frequent updates optimize is best FILL IN HERE > > * </p> > > > > Essentially, I am wondering what are the best practices for calling > > optimize, especially in a frequent update environment. My gut > > feeling is that it should just be scheduled to be done on a regular > > basis, ideally when there is a lull. The docs allude to the fact > > that search performance will be better, but has anyone quantified > > it? The mergeFactor docs say that a smaller merge factor results in > > faster searches on unoptimized (I presume that means relatively > > faster searches to higher merge factors, but still not as fast as > > optimized, correct?) If it hasn't been quantified, maybe I will try > > to whip a benchmark for it. > > > > So, do people in these types of environment typically schedule > > optimize to occur at night or every few hours, or what? I know, "It > > depends...", just am wondering if there is a general consensus that > > would be useful to pass along to readers > > > > -------------------------- > > Grant Ingersoll > > Center for Natural Language Processing > > http://www.cnlp.org > > > > Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ > > LuceneFAQ > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > ------------------------------------------------------ > Grant Ingersoll > http://www.grantingersoll.com/ > http://lucene.grantingersoll.com > http://www.paperoftheweek.com/ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]