On Saturday 15 December 2007 00:17:10 Chris Hostetter wrote: > : Actually FuzzyQuery.rewrite() is pretty expensive so why not introduce a > : caching decorator? A WeakHashMap with key==IndexReader and value==LRU of > : BooleanQueries. > > Applications are certainly welcome to do this (there is nothing to stop > you from calling rewrite before passing the query to your Searcher, i > believe the overhead of calling rewrite on a query that's already been > rewritten is fairly low) but I don't think it would be a good idea to add
Why should subsequent rewrites be faster? The query is being rewritten every time over and over again. Even *if* you can profit by buffered IO you sill have a plenty of string levenshtein OPs. I'm against caching in general because you always run into some hard to understand and examine problem but this seems to be one of the rare cases where caching makes sense. I attached a small test app, the index contains 2.2 million docs and 5 million terms, I search for a pretty common term which was rewritten to 15 terms and hit roughly 4.000 docs (I also tried a term that was rewritten to 1 term and hit about 300 docs, made no difference): rewritten in 809 Overall search time: 842 rewritten in 271 Overall search time: 274 rewritten in 216 Overall search time: 219 rewritten in 180 Overall search time: 182 rewritten in 184 Overall search time: 186 rewritten in 220 Overall search time: 226 rewritten in 207 Overall search time: 208 rewritten in 181 Overall search time: 183 rewritten in 183 Overall search time: 185 rewritten in 180 Overall search time: 181 $ vmstat -S M procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 757 298 56 384 0 0 21 36 39 9 5 1 94 0 > something like this to the core ...for starters we are trying to move > away from "hidden" caches like this that are not transparent (and Well, at least the existing of such an decorator (which you explicitly have to use) will give you a hint that this is performance hot spot. I took me quite some time to figure it out... > controllable) but the users because they have the potential to eat up a > lot of ram. But also: he amount of time needed to rewrite the query is > probably not vastly more expensive then the anout of time to execute the > search .. you might as well cache the entire result keyed off of the > orriginal query (and not just the rewritten query object). > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED]
package test; import java.io.IOException; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import org.apache.lucene.search.FuzzyQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.store.FSDirectory; public class Test { class TracingFuzzyQuery extends FuzzyQuery { private static final long serialVersionUID = -7967844853900602465L; public TracingFuzzyQuery( final Term term, final float minimumSimilarity ) throws IllegalArgumentException { super( term, minimumSimilarity ); } @Override public Query rewrite( final IndexReader reader ) throws IOException { final long t0 = System.currentTimeMillis(); final Query q = super.rewrite( reader ); System.out.println( "rewritten in " + (System.currentTimeMillis() - t0) ); return q; } } /** * @param args */ public static void main( final String[] args ) throws Exception { final IndexSearcher s = new IndexSearcher( FSDirectory.getDirectory( "/tmp/test" ) ); final Test t = new Test(); for( int i = 0; i < 10; i++ ) t.go( s ); s.close(); } private void go( final IndexSearcher s ) throws CorruptIndexException, IOException { final Term t = new Term( "test", "test" ); final TracingFuzzyQuery q = new TracingFuzzyQuery( t, 0.75f ); final long t0 = System.currentTimeMillis(); s.search( q ); System.out.println( "Overall search time: " + (System.currentTimeMillis() - t0) ); } }
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]