We are currently doing something similar here.

We have upwards of 15 million documents in our index.

There has been a lot of discussion on this in the past... But I'll give a few details:

My current techniques for indexing very large amounts of data is to 

Set the merge factor to 90
Leave the maxMergeDocs alone (lucene default)

Index 100,000 (the number you use depends on the filesize/number of fields in the 
documents... If you have a lot of fields, you will run out of file handles sooner I 
believe) documents into an index - no optimize, just index them in.  Close the index.  
Open a new index, write 100,000 docs into this one, etc.  Continue.  If you have more 
than one machine/more than one disk drive, running these processes in parallel helps a 
lot.

After you have a whole bunch of indexes of size 100,000 documents, then do a merge of 
all of the indexes.  You never have to call optimize - it will be optimized when it is 
merged.

I have written an entire wrapper around lucene which handles all of this for me... 
Though I did it at work and I don't know if I can release the code.  I think a couple 
of other people have done similar things, and have released the code.

Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to