Hi folks,
I have done some performance tests for TermVectors and the new TermDocs.skipTo() implementation, both introduced with 1.4. I am very pleased with the results. I did these tests with the Reuters news corpus (roughly 800000 documents).
*) I compared TermVectors with the solution of storing the respective fields and re-analyzing the documents in order to get their terms. According to my measurements, TermVectors speed up accesss to the terms by a factor of 7!
*) For testing skipTo, I used my implementation for getting highly correlated terms. For computing the correlation measure I have to compare a lot of TermDocs lists with each other or other lists of document ids. According to my measurements on an optimized index skipTo speeds up my term correlation implementation by a factor of 2. And the benefit of skipTo probably increases with index size.
regards, Christoph
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
