RE: constant-score rewrite mode for NumericRangeQuery

2009-07-18 Thread Uwe Schindler
Hi Mike,

I did some perf tests with the well-known PerfTest.java from the
FieldCacheRangeFilter JIRA issue.

I compared a 5 mio doc index with precStep=4:

With constant score rewrite: 
avg number of terms: 68.3
TRIE: best time=6.192687 ms; worst time=463.0907 ms; avg=222.6431290998
ms; sum=31994466

With boolean rewrite:
avg number of terms: 68.3
TRIE: best time=12.674237 ms; worst time=583.702957 ms; avg=257.912947 ms;
sum=31994466

Both numbers were taken after some warming up queries, the rand seed was
identical (so exactly same queries). It looks for this index size still
faster than Boolean rewrite. Especially the warmin queries take much longer
with Boolean rewrite. The problem with my test here is, that the whole index
seems to be in OS cache. If it is not in OS cache, I think the much longer
time, the first Boolean queries took, will get more important.

In my opinion, we should keep constant score enabled. My main problem with
Boolean rewrite is the completely useless scoring. A range query should
always have constant score. We could maybe fix this some time in future,
that you can disable scorers for Boolean queries (e.g.
bq.setDoConstantScore(true)). I think this is part of this special issue in
JIRA (do not know the number yet).

A second problem with Boolean rewrite: with precStep=4, it is guaranteed,
that the query will not hit the 1024 max clause problem (see formula with
the theoretical max term number) - so no problem at all. The problem starts,
if you combine 2 or three numeric queries combined by
BooleanClaus.Occur.MUST in a top-level Boolean query (the typical example of
a geo query). In this case, the Boolean queries that only consist of MUST
may be combined into one big one (correct me if I am wrong) and then the max
clause count gets a problem.

If we change the default, keep in mind to reopen SOLR-940, as it assumes to
have constant score mode per default and solr's default precStep is 8 -
*bang*. Maybe the solr people should fix this and still explicitely set the
mode for all range queries.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Friday, July 17, 2009 8:56 PM
 To: java-dev@lucene.apache.org
 Subject: constant-score rewrite mode for NumericRangeQuery
 
 Should we really default to constant-score rewrite with NumericRangeQuery?
 
 Would BooleanQuery rewrite mode give better performance on a large
 index, since the number of terms should be smallish w/ the default
 precisionStep (4), I think?
 
 Mike
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: constant-score rewrite mode for NumericRangeQuery

2009-07-18 Thread Michael McCandless
On Sat, Jul 18, 2009 at 6:54 AM, Uwe Schindleru...@thetaphi.de wrote:

 I did some perf tests with the well-known PerfTest.java from the
 FieldCacheRangeFilter JIRA issue.

 I compared a 5 mio doc index with precStep=4:

 With constant score rewrite:
 avg number of terms: 68.3
 TRIE: best time=6.192687 ms; worst time=463.0907 ms; avg=222.6431290998
 ms; sum=31994466

 With boolean rewrite:
 avg number of terms: 68.3
 TRIE: best time=12.674237 ms; worst time=583.702957 ms; avg=257.912947 ms;
 sum=31994466

 Both numbers were taken after some warming up queries, the rand seed was
 identical (so exactly same queries). It looks for this index size still
 faster than Boolean rewrite.

OK these are good results; thanks for running them!

 Especially the warmin queries take much longer
 with Boolean rewrite. The problem with my test here is, that the whole index
 seems to be in OS cache. If it is not in OS cache, I think the much longer
 time, the first Boolean queries took, will get more important.

Agreed.

 In my opinion, we should keep constant score enabled.

OK +1

 My main problem with
 Boolean rewrite is the completely useless scoring. A range query should
 always have constant score. We could maybe fix this some time in future,
 that you can disable scorers for Boolean queries (e.g.
 bq.setDoConstantScore(true)). I think this is part of this special issue in
 JIRA (do not know the number yet).

I completely agree; we need to make it possible to do BooleanQuery
expansion method with constant scoring (I opened an issue for this
already -- LUCENE-1644).

 A second problem with Boolean rewrite: with precStep=4, it is guaranteed,
 that the query will not hit the 1024 max clause problem (see formula with
 the theoretical max term number) - so no problem at all.

Right.

 The problem starts,
 if you combine 2 or three numeric queries combined by
 BooleanClaus.Occur.MUST in a top-level Boolean query (the typical example of
 a geo query). In this case, the Boolean queries that only consist of MUST
 may be combined into one big one (correct me if I am wrong) and then the max
 clause count gets a problem.

Actually Lucene never does structural optimizations of BooleanQuery,
and I think it should (though scores would be different).

One exception: if the BooleanQuery has a single clause, it'll rewrite
itself to the rewrite of that one sub-query.

 If we change the default, keep in mind to reopen SOLR-940, as it assumes to
 have constant score mode per default and solr's default precStep is 8 -
 *bang*. Maybe the solr people should fix this and still explicitely set the
 mode for all range queries.

Let's not change the default :)

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



constant-score rewrite mode for NumericRangeQuery

2009-07-17 Thread Michael McCandless
Should we really default to constant-score rewrite with NumericRangeQuery?

Would BooleanQuery rewrite mode give better performance on a large
index, since the number of terms should be smallish w/ the default
precisionStep (4), I think?

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org