Goel, Nikhil writes (4/4/2005 7:14 PM):

Hi,



I have been using lucene-1.3.jar for quite some time and we are using another library to store the index in DB.

When we started indexing the writer.optimize used to take in the range of 600-800 milliseconds to return but now our index has grown to huge proportion and its around 10 MB hence the writer.optimize is taking around 30-40 seconds and it is not acceptable for our solution. I put the timings on writer.optimize() and it's the one which takes most of this time.



So I am just wondering if someone is facing the same problem in indexing the 
data when the index is already huge or is there another way to manage such huge 
index.



Here is the simple code which we use to index the data.

IndexWriter writer = new IndexWriter(dbDirectory, new StandardAnalyzer(), 
false); //Create an indexwriter

writer.addDocument(doc); //doc is of type  
org.apache.lucene.document.Document...

writer.optimize(); //optimize is called on indexwriter..This is the one which 
takes most of the time and is responsible for the delay.

writer.close(); // indexwriter is closed


Does this code imply you are optimizing after every new document is indexed? 10MB is actually a pretty small index. Depending on your inflow of documents, you should be able to optimize maybe once a day, during your application's least busy period. Your IndexSearcher can still search your documents effectively while the index is unoptimized. As a first step, try not optimizing at all.

Chuck





The time taken by optimize call grows a lot when the index is of larger size. I tried to 
look it up on Erik Hatcher and Otis Gospodnetić 
<http://www.manning.com/hatcher2#author#author>  book too but everywhere it says 
Lucene is quite scalable and don't have trouble in indexing even with huge data. Can anyone 
please provide  some insight into this?



Thanks.

Nikhil










---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to