[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-05-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344393#comment-17344393
 ] 

ASF subversion and git services commented on LUCENE-9725:
-

Commit e4d4438e047eb68110e6d7c6242c11c3fcd121e2 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e4d4438 ]

LUCENE-9725: Remove unused imports.


> Allow BM25FQuery to use other similarities
> --
>
> Key: LUCENE-9725
> URL: https://issues.apache.org/jira/browse/LUCENE-9725
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Assignee: Julie Tibshirani
>Priority: Major
> Fix For: 8.9
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic 
> combined field where all terms have been indexed. It computes new term and 
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field 
> content, and (2) pass these to a similarity function. There is nothing really 
> specific to BM25Similarity in this approach. In step 2, we could use another 
> similarity, for example BooleanSimilarity or those based on language models 
> like LMDirichletSimilarity. The main restriction is that norms have to be 
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the 
> one configured on IndexSearcher. We could think of this as providing a 
> sensible default approach to cross-field scoring for many similarities. It's 
> an incremental step towards LUCENE-8711, which would give similarities more 
> fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-02-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279229#comment-17279229
 ] 

ASF subversion and git services commented on LUCENE-9725:
-

Commit baef16a9c9fb31e7827631028379e3e3f3c24a61 in lucene-solr's branch 
refs/heads/branch_8x from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=baef16a ]

LUCENE-9725: Allow BM25FQuery to use other similarities. (#2293)

>From a high level, BM25FQuery (1) computes statistic that represent the 
>combined
field content and (2) passes these to a score function. This model makes sense
for many similarities besides BM25.

This PR unhardcodes BM25Similarity in BM25FQuery and instead uses the one
configured on IndexSearcher. It also renames BM25FQuery since it's no longer
specific to BM25.


> Allow BM25FQuery to use other similarities
> --
>
> Key: LUCENE-9725
> URL: https://issues.apache.org/jira/browse/LUCENE-9725
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Major
> Fix For: 8.9
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic 
> combined field where all terms have been indexed. It computes new term and 
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field 
> content, and (2) pass these to a similarity function. There is nothing really 
> specific to BM25Similarity in this approach. In step 2, we could use another 
> similarity, for example BooleanSimilarity or those based on language models 
> like LMDirichletSimilarity. The main restriction is that norms have to be 
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the 
> one configured on IndexSearcher. We could think of this as providing a 
> sensible default approach to cross-field scoring for many similarities. It's 
> an incremental step towards LUCENE-8711, which would give similarities more 
> fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-02-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279152#comment-17279152
 ] 

ASF subversion and git services commented on LUCENE-9725:
-

Commit c3f5454d4903897211021eed0c824cb3797d85d9 in lucene-solr's branch 
refs/heads/master from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c3f5454 ]

LUCENE-9725: Allow BM25FQuery to use other similarities. (#2293)

>From a high level, BM25FQuery (1) computes statistic that represent the 
>combined
field content and (2) passes these to a score function. This model makes sense
for many similarities besides BM25.

This PR unhardcodes BM25Similarity in BM25FQuery and instead uses the one
configured on IndexSearcher. It also renames BM25FQuery since it's no longer
specific to BM25.

> Allow BM25FQuery to use other similarities
> --
>
> Key: LUCENE-9725
> URL: https://issues.apache.org/jira/browse/LUCENE-9725
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic 
> combined field where all terms have been indexed. It computes new term and 
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field 
> content, and (2) pass these to a similarity function. There is nothing really 
> specific to BM25Similarity in this approach. In step 2, we could use another 
> similarity, for example BooleanSimilarity or those based on language models 
> like LMDirichletSimilarity. The main restriction is that norms have to be 
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the 
> one configured on IndexSearcher. We could think of this as providing a 
> sensible default approach to cross-field scoring for many similarities. It's 
> an incremental step towards LUCENE-8711, which would give similarities more 
> fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-02-04 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279130#comment-17279130
 ] 

Julie Tibshirani commented on LUCENE-9725:
--

That makes sense, I'll give it thought and we can discuss on LUCENE-8711 !

> Allow BM25FQuery to use other similarities
> --
>
> Key: LUCENE-9725
> URL: https://issues.apache.org/jira/browse/LUCENE-9725
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic 
> combined field where all terms have been indexed. It computes new term and 
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field 
> content, and (2) pass these to a similarity function. There is nothing really 
> specific to BM25Similarity in this approach. In step 2, we could use another 
> similarity, for example BooleanSimilarity or those based on language models 
> like LMDirichletSimilarity. The main restriction is that norms have to be 
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the 
> one configured on IndexSearcher. We could think of this as providing a 
> sensible default approach to cross-field scoring for many similarities. It's 
> an incremental step towards LUCENE-8711, which would give similarities more 
> fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-02-04 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278677#comment-17278677
 ] 

Adrien Grand commented on LUCENE-9725:
--

+1

It would be nice if we could eventually make cross-field scoring a first-class 
feature of our Similarity API. For instance, if we made the Similarity API 
responsible for combining multiple norms together, we would no longer have to 
put this requirement on how norms must be encoded in the index in the 
CombinedFieldQuery javadocs.

> Allow BM25FQuery to use other similarities
> --
>
> Key: LUCENE-9725
> URL: https://issues.apache.org/jira/browse/LUCENE-9725
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic 
> combined field where all terms have been indexed. It computes new term and 
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field 
> content, and (2) pass these to a similarity function. There is nothing really 
> specific to BM25Similarity in this approach. In step 2, we could use another 
> similarity, for example BooleanSimilarity or those based on language models 
> like LMDirichletSimilarity. The main restriction is that norms have to be 
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the 
> one configured on IndexSearcher. We could think of this as providing a 
> sensible default approach to cross-field scoring for many similarities. It's 
> an incremental step towards LUCENE-8711, which would give similarities more 
> fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-02-02 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277545#comment-17277545
 ] 

Julie Tibshirani commented on LUCENE-9725:
--

I opened https://github.com/apache/lucene-solr/pull/2293 to show the idea.

> Allow BM25FQuery to use other similarities
> --
>
> Key: LUCENE-9725
> URL: https://issues.apache.org/jira/browse/LUCENE-9725
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic 
> combined field where all terms have been indexed. It computes new term and 
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field 
> content, and (2) pass these to a similarity function. There is nothing really 
> specific to BM25Similarity in this approach. In step 2, we could use another 
> similarity, for example BooleanSimilarity or those based on language models 
> like LMDirichletSimilarity. The main restriction is that norms have to be 
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the 
> one configured on IndexSearcher. We could think of this as providing a 
> sensible default approach to cross-field scoring for many similarities. It's 
> an incremental step towards LUCENE-8711, which would give similarities more 
> fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org