Hi

In an attempt to understand how to do document-level boosting (following
this thread
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201302.mbox/%[email protected]%3E),
I experimented with the 3 easiest ways that currently exist in Lucene (that
I'm aware of, maybe there are more): two of them use CustomScoreQuery and
the third uses the new Expression module.

I created a simple index with two documents with the field "f" and value
"test doc" (for both). I also added the field "boost" with values 1L
(doc-0) and 2L (doc-1). I then searched using each method and got different
results w.r.t. computed scores:

*CustomScoreProvider
*
As far as I understand, you should override
CustomScoreQuery.getCustomScoreProvider if you want to apply a different
function than score*boost (e.g score^boost) to the documents. Nevertheless,
nothing prevents you from giving a CustomScoreProvider which reads from the
'boost' field and does the multiplication (since it receives the
AtomicReaderContext). I wrote one and the result scores are:

search CustomScoreProvider
doc=1, score=0.74316853
doc=0, score=0.37158427

*FunctionQuery
*
I wasn't able to find a ValueSource which reads from an NDV field, so I
wrote a NumericDocValuesFieldSource which returns a LongValues that reads
from the NumericDocValues (if there isn't indeed one, I can open an issue
to add it). The result scores are:

search NumericDocValuesFieldSource
doc=1, score=0.32644913
doc=0, score=0.16322456

*Expression
*
I tried the new module, following TestDemoExpression and compiled the
expression using this code:

    Expression expr = JavascriptCompiler.compile("_score * boost");
    SimpleBindings bindings = new SimpleBindings();
    bindings.add(new SortField("_score", SortField.Type.SCORE));
    bindings.add(new SortField("boost", SortField.Type.LONG));

The result scores are:

search Expression
doc=1, score=NaN, field=0.7431685328483582
doc=0, score=NaN, field=0.3715842664241791

As you can see, both CustomScoreProvider and Expression methods return same
scores for the docs, while the FunctionQuery method returns different
scores. The reason is that when using FunctionQuery, the scores of the
ValueSources are multiplied by queryWeight, which seems correct to me.

Expression is more about sorting than scoring as far as I understand (for
instance, the result FieldDocs.score is NaN), so I'm ok with it not
factoring in queryWeight (maybe we could implement such expression?). What
I like about it is that I didn't have to implement anything (e.g.
NumericDocValuesFieldSource or CSProvider) - it just worked. And if all you
care about is the order of results, it gets the job done.

So between FunctionQuery and CustomScoreProvider, which is the correct way
to boost a document by an NDV field? I think FunctionQuery?

Separately, I think we can improve CSQ.getCSProvider jdocs. They say: "The
default implementation returns a default implementation as specified in the
docs of CustomScoreProvider" but the jdocs of CSP don't mention it
multiplies.

Shai

Reply via email to