I did the nearly the exact same thing in my "derived" Lucene. But in order to limit modifications to the Lucene core, I created a QueryCache class, and have derived versions of Prefix and Range query consult the class, passing in the IndexReader and query to see if there is a cached result. I also calls QueryCache.clear(IndexReader), when the IndexReader goes out of scope.
Will there be a problem with associating the cache with the IndexSearcher instances, since it seems that common Lucene code uses code similar to IndexSearcher searcher = new IndexSearcher(reader); every time they need to perform a search? It is REALLY efficient for automatic caching of common range queries and prefix queries, as I think many users of Lucene pass use a range query to look for documents modified in the "last n days". The ONLY overhead is extra memory usage (since without the cache the query needs to be executed as is), but the size of the LRU cache can be controlled via a property. -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 10, 2005 3:40 PM To: java-dev@lucene.apache.org Subject: constant scoring queries Background: In http://issues.apache.org/bugzilla/show_bug.cgi?id=34673, Yonik Seely proposes a ConstantScoreQuery, based on a Filter. And in http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg08007.html I proposed a mechanism to promote the use of Filters. Through all of this, Paul Elshot has hinted that there might be a better way. Here's another proposal, tackling many of the same issues: 1. Add two methods to Query.java: public boolean constantScoring(); public void constantScoring(boolean); When constantScoring(), the boost() is the score for matches. 2. Add two methods to Searcher.java: public BitSet cachedBitSet(Query) { return null; } public void cacheBitSet(Query, BitSet) {} IndexSearcher overrides these to maintain an LRU cache of bitsets. 3. Modify BooleanQuery so that, when constantScoring(), TooManyClauses is not thrown. 4. Modify BooleanScorer to, if constantScoring(), - check Searcher for a cached bitset - failing that, create a bitset - evaluate clauses serially, saving results in bitset - cache the bitset - use the bitset to handle doc(), next() and skipTo(); 5. TermQuery and PhraseQuery could be similarly modified, so that, when constant scoring, bitsets are cached for very common terms (e.g., >5% of documents). With these changes, WildcardQuery, PrefixQuery, RangeQuery etc., when declared to be constant scoring, will operate much faster and never throw TooManyClauses. We can add an option (the default?) to QueryParser to make these constant scoring. Also, instead of BitSet we could use an interface: public interface DocIdSet { void add(int docId); boolean contains(int docId); int next(int docId); } to permit sparse representations. Thoughts? Doug --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]