In the brief test I did indexing 500K documents and optimizing every 10K 
documents, I found that indexing is constant time (flat) and optimize() 
time increases linearly.

-Sean

Grant Ingersoll wrote on 4/18/2007, 4:29 PM:

 > Has anyone done in benchmarking to approximate how long it takes to
 > optimize different size indexes?  Is the merging linear, sub-linear,
 > etc.?
 >
 > On Apr 8, 2007, at 1:01 AM, Otis Gospodnetic wrote:
 >
 > > I'd advise against calling optimize() at all in an environment
 > > whose indices are constantly updated.  That's what mergeFactor
 > > helps with.  Keep it low, and Lucene itself will regularly merge
 > > segments more often.  If one still wants to call optimize(), you'd
 > > want to know how long it would take on with the index of your size
 > > and if you've got enough lull time, do it, otherwise postpone it.
 > >
 > > Otis
 > >  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 > > Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
 > >
 > > ----- Original Message ----
 > > From: Grant Ingersoll <[EMAIL PROTECTED]>
 > > To: java-dev@lucene.apache.org
 > > Sent: Friday, April 6, 2007 6:53:13 PM
 > > Subject: optimize() method call
 > >
 > > I was looking at the javadocs for the optimize() call on IndexWriter
 > > which contain a great amount of detail about what happens, but very
 > > little guidance on when.  I would like to add more on when.  I
 > > generally do optimize after I finish my indexing, which is pretty
 > > straightforward to determine when one has a more or less static
 > > collection.  What isn't so clear to me, b/c I haven't dealt w/ it too
 > > much is when optimize should be called in environments that are
 > > frequently updated.
 > >
 > > Here's what I have for text so far:
 > > *
 > >     * <p>It is recommended that this method be called upon completion
 > > of indexing.  In
 > >     * environments with frequent updates optimize is best FILL IN HERE
 > >     * </p>
 > >
 > > Essentially, I am wondering what are the best practices for calling
 > > optimize, especially in a frequent update environment.  My gut
 > > feeling is that it should just be scheduled to be done on a regular
 > > basis, ideally when there is a lull.  The docs allude to the fact
 > > that search performance will be better, but has anyone quantified
 > > it?  The mergeFactor docs say that a smaller merge factor results in
 > > faster searches on unoptimized (I presume that means relatively
 > > faster searches to higher merge factors, but still not as fast as
 > > optimized, correct?)  If it hasn't been quantified, maybe I will try
 > > to whip a benchmark for it.
 > >
 > > So, do people in these types of environment typically schedule
 > > optimize to occur at night or every few hours, or what?  I know, "It
 > > depends...", just am wondering if there is a general consensus that
 > > would be useful to pass along to readers
 > >
 > > --------------------------
 > > Grant Ingersoll
 > > Center for Natural Language Processing
 > > http://www.cnlp.org
 > >
 > > Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
 > > LuceneFAQ
 > >
 > >
 > >
 > >
 > >
 > >
 > > ---------------------------------------------------------------------
 > > To unsubscribe, e-mail: [EMAIL PROTECTED]
 > > For additional commands, e-mail: [EMAIL PROTECTED]
 > >
 >
 > ------------------------------------------------------
 > Grant Ingersoll
 > http://www.grantingersoll.com/
 > http://lucene.grantingersoll.com
 > http://www.paperoftheweek.com/
 >
 >
 >
 > ---------------------------------------------------------------------
 > To unsubscribe, e-mail: [EMAIL PROTECTED]
 > For additional commands, e-mail: [EMAIL PROTECTED]
 >



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to