Hi all,

I just started trying to use Lucene to index approximately 13,000 XML documents representing biological data..each document is approximately 20-30KB.

I modified some code from cocoon components to use SAX to parse my documents and create Lucene Documents. This process is very quick.

The following code is where i started off to write the index to disk.

writer = new IndexWriter(fsd, analyzer, true);

Iterator myit = docList.iterator();
while(myit.hasNext()) {
writer.addDocument((Document)myit.next());
System.out.println(++counter);
}
writer.close();

This is taking much more time than expected. I'm using the StandardAnalyzer, and my XML data is about 20-30Kb per file. The indexing is taking approximately 2-3 seconds per document and as the index grows it gets significantly slower. I'm running this on a 2.4GHz linux machine with 1GB ram.

I tried a few different stragegies, but i end up with too many files open exceptions.

I don't think it should progressively slow down in proportion to the size of the index. Is this assumption wrong?

Am i doing something wrong? is there a way to utilize the memory more and the filesystem less and just dump the index periodically?

any help would be appreciated..thanks

Marc Dumontier Intermediate Developer
Blueprint Initiative
Mount Sinai Hospital
http://www.bind.ca



--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to