Chris Miller wrote:
Thanks for your commments Ulrich. I just posted a message asking if anyone
had attempted this approach! Sounds like you have, and it works :-)  Thanks
for information, this sounds pretty close to what my preferred approach
would be.

This is a good approach if the number of total documents doesn't grow too much. There's obviously a limit to full index runs at some point.


You say you get 2000 docs/minute. I've done some benchmarking and managed to
get our data indexing at ~1000/minute on an Athlon 1800+ (and most of that
speed was acheived by bumping the IndexWriter.mergeFactor up to 100 or so).
Our data is coming from a database table, each record contains about 40
fields, and I'm indexing 8 of those fields (an ID, 4 number fields, 3 text
fields including one that has ~2k text). Does this sound reasonable to you,
or do you have any tips that might improve that performance?

You need to find out where you lose most of the time:


a) in data access (like your database could be too slow, in my case I am scanning the local filesystem)
b) in parsing (probably not an issue when reading from a DB, but in my case it is, I have HTML files)
c) in indexing


I haven't gone to the trouble to find that out for my app, because it is fast enough the way it is.

However, what I wonder: if you have your data in a database anyway, why not use the database's indexing features? It seems like Lucene is an additional layer on top of your data, which you don't really need.

cheers,

Ulrich



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to