Thanks for your commments Ulrich. I just posted a message asking if anyone had attempted this approach! Sounds like you have, and it works :-) Thanks for information, this sounds pretty close to what my preferred approach would be.
This is a good approach if the number of total documents doesn't grow too much. There's obviously a limit to full index runs at some point.
You say you get 2000 docs/minute. I've done some benchmarking and managed to get our data indexing at ~1000/minute on an Athlon 1800+ (and most of that speed was acheived by bumping the IndexWriter.mergeFactor up to 100 or so). Our data is coming from a database table, each record contains about 40 fields, and I'm indexing 8 of those fields (an ID, 4 number fields, 3 text fields including one that has ~2k text). Does this sound reasonable to you, or do you have any tips that might improve that performance?
You need to find out where you lose most of the time:
a) in data access (like your database could be too slow, in my case I am scanning the local filesystem)
b) in parsing (probably not an issue when reading from a DB, but in my case it is, I have HTML files)
c) in indexing
I haven't gone to the trouble to find that out for my app, because it is fast enough the way it is.
However, what I wonder: if you have your data in a database anyway, why not use the database's indexing features? It seems like Lucene is an additional layer on top of your data, which you don't really need.
cheers,
Ulrich
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]