Re: Indexing slows down considerably after a few million documents

Yonik Seeley Fri, 27 Oct 2006 07:58:43 -0700

Hi Mekin,

A couple of things:
- You might try increasing maxBufferedDocs to 1000 or so (depending on
your document size).  That controls the size of the smallest segments
and will decrease the numer of merges you end up doing.
- Try using the trunk lucene version... it has indexing enhancements
like avoiding some counting overhead when deciding if segments should
be merged.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


On 10/27/06, Mekin Maheshwari <[EMAIL PROTECTED]> wrote:

I am creating an index of about 7 Million documents.
The total size of the index is about 2.7G once indexing is done.

For the 1st 3Million documents, the indexer takes about 3 hours (can i
get better than this? )
 - 4 seconds per thousand documents

After this it slows down terribly and takes about 20 seconds for every
thousand documents.


It doesnt seem to be a data issue, as I tried starting to create the
index from the 3Millionth document & I get the initial speed (1k docs
in 3 secs)

What could be going wrong?
I have tried a few things, but cant quickly try out a lot of things as
it takes 3 hours before the slow down happens.

Below is the relevant pieces of the code::

Thanks for the help,
mekin

IndexWriter writer = new IndexWriter(dirname, analyzer,doTotalIndex );
writer.setMergeFactor(1000);

while(more records to get){
 Document doc = new Document();
//... add fields to doc
//add boost to doc
//
 writer.addDocument(doc);
}

writer.optimize();
writer.close();


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing slows down considerably after a few million documents

Reply via email to