[ http://issues.apache.org/jira/browse/LUCENE-502?page=comments#action_12368784 ]
Steven Tamm commented on LUCENE-502: ------------------------------------ > The conjunctive scorer does not call score(HitCollector,int). This is only > called in a few cases anymore. However, in your comments to LUCENE-505 you said this: "For example, in TermScorer.score(HitCollector, int), Lucene's innermost loop, you change two array accesses into a call to an interface. That could make a substantial difference." Which is true? Or, as it seems likely, TermScorer was optimized for a case that is no longer valid (i.e. ConjunctiveScorer). > If that were the case, then then termDocs(int[], int[]) method would never > have been added! This hasn't been true for at least 3 years. Inlining by hand is not necessary anymore with hotspot (I don't know about gcj). Run a benchmark on JDK 1.5 to prove this to yourself. In short, we should have two TermScorer implementations. One for low documents/term, and one for high documents/term. > TermScorer caches values unnecessarily > -------------------------------------- > > Key: LUCENE-502 > URL: http://issues.apache.org/jira/browse/LUCENE-502 > Project: Lucene - Java > Type: Improvement > Components: Search > Versions: 1.9 > Reporter: Steven Tamm > Attachments: TermScorer.patch > > TermScorer aggressively caches the doc and freq of 32 documents at a time for > each term scored. When querying for a lot of terms, this causes a lot of > garbage to be created that's unnecessary. The SegmentTermDocs from which it > retrieves its information doesn't have any optimizations for bulk loading, > and it's unnecessary. > In addition, it has a SCORE_CACHE, that's of limited benefit. It's caching > the result of a sqrt that should be placed in DefaultSimilarity, and if > you're only scoring a few documents that contain those terms, there's no need > to precalculate the SQRT, especially on modern VMs. > Enclosed is a patch that replaces TermScorer with a version that does not > cache the docs or feqs. In the case of a lot of queries, that saves 196 > bytes/term, the unnecessary disk IO, and extra SQRTs which adds up. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
