Re: Optimize for large index size

vivek sar Fri, 18 Jan 2008 21:51:10 -0800

Thanks Michael for the feedback. Couple more questions,

1) Doesn't Lucene do some sort of optimization internally based on
mergefactor, i.e, if the number of segments grow over the mergefactor
number Lucene would automatically merge them into one segment - is
this different than optimization? Does optimize do more than this?
The reason we are keeping high merge factor (200) is so Lucene doesn't
do frequent optimization on its own.


2) Do you know any approximate release date for 2.3?


We do have around 30 fields in our index (over 10 are untokenized, can
I just make then NO_NORM?).

Thanks,
-vivek

On Jan 18, 2008 2:37 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
>
> vivek sar wrote:
>
> > Hi,
> >
> >   We are using Lucene 2.2. We have an index of size 70G (within 3-4
> > days) and growing. We run optimize pretty frequently (once every hour
> > - due to large number of index updates every min - can be up to 100K
> > new documents every min). I have seen every now and then the optimize
> > takes 3-4 hours to complete and up to 8 G memory (our limit). This
> > makes the whole system slow. Few questions,
> >
> > 1) Is there any alternative to optimize? That is, can we do without
> > optimize and still have our search fast?
>
> In Lucene 2.3 (coming out shortly) there is a new "partial optimize"
> method that takes an int maxNumSegments.  It will optimize your index
> down to that many segments.  This let's you reduce cost of optimizing
> while still getting faster searching.  Maybe try that?
>
> Also, Lucene 2.3 has sizable speedups to indexing, and uses RAM
> buffering more efficiently.  This will let you hold more documents in
> RAM before flushing a new segment, which in turn should reduce your
> merging cost.
>
> > 2) What's the best way to use optimize, i.e. how can we make the
> > optimize much faster and use lesser memory?
>
> Fundamentally optimize is quite time consuming because it has to do
> massive segment merging (the final merge being the worst).
>
> However, it is not supposed to be so memory consuming.  How are you
> measuring memory usage?   (Try using java -verbose:gc to see actual
> heap usage after full GC). What kind of documents are you creating?
> EG one known memory issue is if you have many diverse fields, all
> with norms enabled.  Norms are not stored sparsely, so, this will
> consume alot of RAM during optimize and during searching.
>
> > 3) Is there a way to partition the indexes using Lucene? Let's say we
> > partition daily, so we have to optimize only the daily indexes and not
> > the whole thing.
>
> Yes, you can do this, and run searches over these indices (use
> MultiSearcher).  You can then merge them into a single index, using
> addIndexesNoOptimize.  But, this is not really different from using
> optimize(int maxNumSegments).
>
> > Our mergefactor=200 and maxMergeDocs=99999
>
> In Lucene 2.3, segment merging is done in a background thread(s).
> Given that, I think you'd want to decrease mergeFactor so that
> merging is taking place while you are indexing.  (Test to be sure).
> That should make your optimize call less costly.
>
> > Thanks,
> > -vivek
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Optimize for large index size

Reply via email to