Paul,
Thanks for your very thorough response. It is very helpful.
For all my projects, I'm using the latest Subversion codebase and staying current with any changes there, so that is very good news.
Erik
On Apr 1, 2005, at 1:10 PM, Paul Elschot wrote:
On Friday 01 April 2005 18:14, Erik Hatcher wrote:I will soon create some tests for this scenario, but wanted to run this
by the list as well....
Great, see below.
What performance differences would be seen between a query like this:
a AND b AND c AND d
This will use a single ConjunctionScorer, and it is the fastest form.
and this one:
((a AND b) AND c) AND d
In other words, will building a query with nested boolean queries be
substantially slower than a single boolean query with many clauses? Or
might it be the other way around?
This will use a ConjunctionScorer for (a AND b), assuming a and b are terms. For the other AND operators a BooleanScorer will be used in 1.4.3. The development version will use a ConjunctionScorer at each AND operator.
The main difference between a ConjunctionScorer and a BooleanScorer
is the use of skipTo(), ie. the forwarding information in the term docs
index, that allows to 'fast forward' to a given document.
This 'fast forward' is useful for AND queries, and ConjunctionScorer does it,
BooleanScorer simply uses next() instead. The next() method iterates
over all documents in a term docs index.
In other words, the nested form should be significantly slower than
the flat form in 1.4.3, and just a bit slower in the development version.
Another skipTo advantage comes from this form: (a OR b) and c In 1.4.3, this uses a BooleanScorer for both operators, making this as much work as: (a OR b) OR c. In the development version, the OR operator gets a DisjunctionScorer, and the AND operator a ConjunctionScorer, both allowing the use of skipTo(), even on the a and b terms.
In this context (a OR b) can also be for example a fuzzy query or a prefix
query.
The development version also uses skipTo() on b in the following situations:
+a b
a -b
So, when you measure, please use both 1.4.3 and the development version
to see the differences. And, off course, the larger your index, the better.
As the code is still a bit young, you might be in for some surprises, too.
skipTo() has the biggest advantages when the index data is not
available in any cache.
Regards, Paul Elschot.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]