Hi all,
I just started trying to use Lucene to index approximately 13,000 XML
documents representing biological data..each document is approximately
20-30KB.
I modified some code from cocoon components to use SAX to parse my
documents and create Lucene Documents. This process is very quick.
The following code is where i started off to write the index to disk.
writer = new IndexWriter(fsd, analyzer, true);
Iterator myit = docList.iterator();
while(myit.hasNext()) {
writer.addDocument((Document)myit.next());
System.out.println(++counter);
}
writer.close();
This is taking much more time than expected. I'm using the
StandardAnalyzer, and my XML data is about 20-30Kb per file. The
indexing is taking approximately 2-3 seconds per document and as the
index grows it gets significantly slower. I'm running this on a 2.4GHz
linux machine with 1GB ram.
I tried a few different stragegies, but i end up with too many files
open exceptions.
I don't think it should progressively slow down in proportion to the
size of the index. Is this assumption wrong?
Am i doing something wrong? is there a way to utilize the memory more
and the filesystem less and just dump the index periodically?
any help would be appreciated..thanks
Marc Dumontier
Intermediate Developer
Blueprint Initiative
Mount Sinai Hospital
http://www.bind.ca
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
- Re: significant performance issues Marc Dumontier
- Re: significant performance issues Otis Gospodnetic
- Re: significant performance issues DMGoodstein