[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities

ASF subversion and git services (Jira) Thu, 04 Feb 2021 12:44:04 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279152#comment-17279152
 ]


ASF subversion and git services commented on LUCENE-9725:
---------------------------------------------------------

Commit c3f5454d4903897211021eed0c824cb3797d85d9 in lucene-solr's branch 
refs/heads/master from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c3f5454 ]

LUCENE-9725: Allow BM25FQuery to use other similarities. (#2293)

>From a high level, BM25FQuery (1) computes statistic that represent the 
>combined
field content and (2) passes these to a score function. This model makes sense
for many similarities besides BM25.

This PR unhardcodes BM25Similarity in BM25FQuery and instead uses the one
configured on IndexSearcher. It also renames BM25FQuery since it's no longer
specific to BM25.

> Allow BM25FQuery to use other similarities
> ------------------------------------------
>
>                 Key: LUCENE-9725
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9725
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Julie Tibshirani
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic 
> combined field where all terms have been indexed. It computes new term and 
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field 
> content, and (2) pass these to a similarity function. There is nothing really 
> specific to BM25Similarity in this approach. In step 2, we could use another 
> similarity, for example BooleanSimilarity or those based on language models 
> like LMDirichletSimilarity. The main restriction is that norms have to be 
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the 
> one configured on IndexSearcher. We could think of this as providing a 
> sensible default approach to cross-field scoring for many similarities. It's 
> an incremental step towards LUCENE-8711, which would give similarities more 
> fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities

Reply via email to