On Sat, Jul 18, 2009 at 6:54 AM, Uwe Schindler<u...@thetaphi.de> wrote:
> I did some perf tests with the well-known PerfTest.java from the > FieldCacheRangeFilter JIRA issue. > > I compared a 5 mio doc index with precStep=4: > > With constant score rewrite: > avg number of terms: 68.3 > TRIE: best time=6.192687 ms; worst time=463.0907 ms; avg=222.64312909999998 > ms; sum=31994466 > > With boolean rewrite: > avg number of terms: 68.3 > TRIE: best time=12.674237 ms; worst time=583.702957 ms; avg=257.912947 ms; > sum=31994466 > > Both numbers were taken after some warming up queries, the rand seed was > identical (so exactly same queries). It looks for this index size still > faster than Boolean rewrite. OK these are good results; thanks for running them! > Especially the warmin queries take much longer > with Boolean rewrite. The problem with my test here is, that the whole index > seems to be in OS cache. If it is not in OS cache, I think the much longer > time, the first Boolean queries took, will get more important. Agreed. > In my opinion, we should keep constant score enabled. OK +1 > My main problem with > Boolean rewrite is the completely useless scoring. A range query should > always have constant score. We could maybe fix this some time in future, > that you can disable scorers for Boolean queries (e.g. > bq.setDoConstantScore(true)). I think this is part of this special issue in > JIRA (do not know the number yet). I completely agree; we need to make it possible to do BooleanQuery expansion method with constant scoring (I opened an issue for this already -- LUCENE-1644). > A second problem with Boolean rewrite: with precStep=4, it is guaranteed, > that the query will not hit the 1024 max clause problem (see formula with > the theoretical max term number) - so no problem at all. Right. > The problem starts, > if you combine 2 or three numeric queries combined by > BooleanClaus.Occur.MUST in a top-level Boolean query (the typical example of > a geo query). In this case, the Boolean queries that only consist of MUST > may be combined into one big one (correct me if I am wrong) and then the max > clause count gets a problem. Actually Lucene never does structural optimizations of BooleanQuery, and I think it should (though scores would be different). One exception: if the BooleanQuery has a single clause, it'll rewrite itself to the rewrite of that one sub-query. > If we change the default, keep in mind to reopen SOLR-940, as it assumes to > have constant score mode per default and solr's default precStep is 8 -> > *bang*. Maybe the solr people should fix this and still explicitely set the > mode for all range queries. Let's not change the default :) Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org