vivek sar wrote:

Hi,

  We are using Lucene 2.2. We have an index of size 70G (within 3-4
days) and growing. We run optimize pretty frequently (once every hour
- due to large number of index updates every min - can be up to 100K
new documents every min). I have seen every now and then the optimize
takes 3-4 hours to complete and up to 8 G memory (our limit). This
makes the whole system slow. Few questions,

1) Is there any alternative to optimize? That is, can we do without
optimize and still have our search fast?

In Lucene 2.3 (coming out shortly) there is a new "partial optimize" method that takes an int maxNumSegments. It will optimize your index down to that many segments. This let's you reduce cost of optimizing while still getting faster searching. Maybe try that?

Also, Lucene 2.3 has sizable speedups to indexing, and uses RAM buffering more efficiently. This will let you hold more documents in RAM before flushing a new segment, which in turn should reduce your merging cost.

2) What's the best way to use optimize, i.e. how can we make the
optimize much faster and use lesser memory?

Fundamentally optimize is quite time consuming because it has to do massive segment merging (the final merge being the worst).

However, it is not supposed to be so memory consuming. How are you measuring memory usage? (Try using java -verbose:gc to see actual heap usage after full GC). What kind of documents are you creating? EG one known memory issue is if you have many diverse fields, all with norms enabled. Norms are not stored sparsely, so, this will consume alot of RAM during optimize and during searching.

3) Is there a way to partition the indexes using Lucene? Let's say we
partition daily, so we have to optimize only the daily indexes and not
the whole thing.

Yes, you can do this, and run searches over these indices (use MultiSearcher). You can then merge them into a single index, using addIndexesNoOptimize. But, this is not really different from using optimize(int maxNumSegments).

Our mergefactor=200 and maxMergeDocs=99999

In Lucene 2.3, segment merging is done in a background thread(s). Given that, I think you'd want to decrease mergeFactor so that merging is taking place while you are indexing. (Test to be sure). That should make your optimize call less costly.

Thanks,
-vivek

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to