I think that I actually abused CSProvider, and it's not supposed to be used
that way. It really is supposed to be used when you want to apply different
combination on the two scores. While nothing prevents you from reading the
scores from a different source, it's better to implement that capability
through a custom ValueSource. So maybe we should put such a note on
CSProvider jdocs...

Shai


On Fri, Sep 20, 2013 at 3:01 PM, Shai Erera <[email protected]> wrote:

> Hi
>
> In an attempt to understand how to do document-level boosting (following
> this thread
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/201302.mbox/%[email protected]%3E),
> I experimented with the 3 easiest ways that currently exist in Lucene
> (that I'm aware of, maybe there are more): two of them use CustomScoreQuery
> and the third uses the new Expression module.
>
> I created a simple index with two documents with the field "f" and value
> "test doc" (for both). I also added the field "boost" with values 1L
> (doc-0) and 2L (doc-1). I then searched using each method and got different
> results w.r.t. computed scores:
>
> *CustomScoreProvider
> *
> As far as I understand, you should override
> CustomScoreQuery.getCustomScoreProvider if you want to apply a different
> function than score*boost (e.g score^boost) to the documents.
> Nevertheless, nothing prevents you from giving a CustomScoreProvider which
> reads from the 'boost' field and does the multiplication (since it receives
> the AtomicReaderContext). I wrote one and the result scores are:
>
> search CustomScoreProvider
> doc=1, score=0.74316853
> doc=0, score=0.37158427
>
> *FunctionQuery
> *
> I wasn't able to find a ValueSource which reads from an NDV field, so I
> wrote a NumericDocValuesFieldSource which returns a LongValues that reads
> from the NumericDocValues (if there isn't indeed one, I can open an issue
> to add it). The result scores are:
>
> search NumericDocValuesFieldSource
> doc=1, score=0.32644913
> doc=0, score=0.16322456
>
> *Expression
> *
> I tried the new module, following TestDemoExpression and compiled the
> expression using this code:
>
>     Expression expr = JavascriptCompiler.compile("_score * boost");
>     SimpleBindings bindings = new SimpleBindings();
>     bindings.add(new SortField("_score", SortField.Type.SCORE));
>     bindings.add(new SortField("boost", SortField.Type.LONG));
>
> The result scores are:
>
> search Expression
> doc=1, score=NaN, field=0.7431685328483582
> doc=0, score=NaN, field=0.3715842664241791
>
> As you can see, both CustomScoreProvider and Expression methods return
> same scores for the docs, while the FunctionQuery method returns different
> scores. The reason is that when using FunctionQuery, the scores of the
> ValueSources are multiplied by queryWeight, which seems correct to me.
>
> Expression is more about sorting than scoring as far as I understand (for
> instance, the result FieldDocs.score is NaN), so I'm ok with it not
> factoring in queryWeight (maybe we could implement such expression?). What
> I like about it is that I didn't have to implement anything (e.g.
> NumericDocValuesFieldSource or CSProvider) - it just worked. And if all you
> care about is the order of results, it gets the job done.
>
> So between FunctionQuery and CustomScoreProvider, which is the correct way
> to boost a document by an NDV field? I think FunctionQuery?
>
> Separately, I think we can improve CSQ.getCSProvider jdocs. They say: "The
> default implementation returns a default implementation as specified in
> the docs of CustomScoreProvider" but the jdocs of CSP don't mention it
> multiplies.
>
> Shai
>

Reply via email to