Re: performance feedback

Michael McCandless Wed, 09 Jul 2008 07:38:46 -0700


This is great to hear!

If you tweak things a bit (increase RAM buffer size, useautoCommit=false, use threads, etc) you should be able to eke out somemore gains...


Are you storing fields & using term vectors on any of your fields?

Mike

Beard, Brian wrote:


I just did an update from lucene 2.2.0 to 2.3.2 and thought I'd give
some kudos for the indexing performance enhancements.

The lucene indexing portion is about 6-8 times faster. Previously we
were doing ~60-120 documents per second, now we're between 400-1000,

depending on the type of document, size, and how many fields thereare.

Haven't done a formal comparison side by side, but certainly is
substantially faster.

The gain would have been equal to the 8-10 times in the readme, but
using custom tokenizers slows things down a little vs. using the

standard one. At first I didn't realize to use reusableTokenFilterwhich

bypassed the custom tokenizers and had the 8-10x improvement. Maybe
there's some more gain to be had I can pursue.

We index from 5 to 20 fields per document (in serial through

IndexWriter). Most are 3-5K total size, but can vary quite a bit.Total

index size (eventually) will be ~15G.

Thanks,

Brian Beard





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: performance feedback

Reply via email to