I just did an update from lucene 2.2.0 to 2.3.2 and thought I'd give some kudos for the indexing performance enhancements.
The lucene indexing portion is about 6-8 times faster. Previously we were doing ~60-120 documents per second, now we're between 400-1000, depending on the type of document, size, and how many fields there are. Haven't done a formal comparison side by side, but certainly is substantially faster. The gain would have been equal to the 8-10 times in the readme, but using custom tokenizers slows things down a little vs. using the standard one. At first I didn't realize to use reusableTokenFilter which bypassed the custom tokenizers and had the 8-10x improvement. Maybe there's some more gain to be had I can pursue. We index from 5 to 20 fields per document (in serial through IndexWriter). Most are 3-5K total size, but can vary quite a bit. Total index size (eventually) will be ~15G. Thanks, Brian Beard --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]