On Sat, Jul 18, 2009 at 6:54 AM, Uwe Schindler<u...@thetaphi.de> wrote:

> I did some perf tests with the well-known PerfTest.java from the
> FieldCacheRangeFilter JIRA issue.
>
> I compared a 5 mio doc index with precStep=4:
>
> With constant score rewrite:
> avg number of terms: 68.3
> TRIE: best time=6.192687 ms; worst time=463.0907 ms; avg=222.64312909999998
> ms; sum=31994466
>
> With boolean rewrite:
> avg number of terms: 68.3
> TRIE: best time=12.674237 ms; worst time=583.702957 ms; avg=257.912947 ms;
> sum=31994466
>
> Both numbers were taken after some warming up queries, the rand seed was
> identical (so exactly same queries). It looks for this index size still
> faster than Boolean rewrite.

OK these are good results; thanks for running them!

> Especially the warmin queries take much longer
> with Boolean rewrite. The problem with my test here is, that the whole index
> seems to be in OS cache. If it is not in OS cache, I think the much longer
> time, the first Boolean queries took, will get more important.

Agreed.

> In my opinion, we should keep constant score enabled.

OK +1

> My main problem with
> Boolean rewrite is the completely useless scoring. A range query should
> always have constant score. We could maybe fix this some time in future,
> that you can disable scorers for Boolean queries (e.g.
> bq.setDoConstantScore(true)). I think this is part of this special issue in
> JIRA (do not know the number yet).

I completely agree; we need to make it possible to do BooleanQuery
expansion method with constant scoring (I opened an issue for this
already -- LUCENE-1644).

> A second problem with Boolean rewrite: with precStep=4, it is guaranteed,
> that the query will not hit the 1024 max clause problem (see formula with
> the theoretical max term number) - so no problem at all.

Right.

> The problem starts,
> if you combine 2 or three numeric queries combined by
> BooleanClaus.Occur.MUST in a top-level Boolean query (the typical example of
> a geo query). In this case, the Boolean queries that only consist of MUST
> may be combined into one big one (correct me if I am wrong) and then the max
> clause count gets a problem.

Actually Lucene never does structural optimizations of BooleanQuery,
and I think it should (though scores would be different).

One exception: if the BooleanQuery has a single clause, it'll rewrite
itself to the rewrite of that one sub-query.

> If we change the default, keep in mind to reopen SOLR-940, as it assumes to
> have constant score mode per default and solr's default precStep is 8 ->
> *bang*. Maybe the solr people should fix this and still explicitely set the
> mode for all range queries.

Let's not change the default :)

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to