tkarampAlpha opened a new issue, #13796:
URL: https://github.com/apache/lucene/issues/13796
### Description
It seems that for SpanOrQuery IDF of terms belonging in subqueries that will
not match a given document, will affect said document's score.
I have observed this through on which I have 3 documents:
```
doc1:
field: something
doc2:
field: nothing
doc3:
field: anything
```
And I issue the following query:
```spanOr([Contents:something, Contents:nothing])```
If you check at the score explanation you will notice that in both
document's score the idf of both terms affects it even though for each document
only one matches.
This is an example of the explanation of the first document's score:
```
3.9616547 = weight(spanOr([Contents:something, Contents:nothing]) in 0)
[AsBM25Similarity], result of:
3.9616547 = score(freq=1.0), computed as boost * idf * tf from:
51.0 = boost
3.9616585 = idf, sum of:
1.9808292 = idf for term nothing , computed as log(1 + (docCount -
docFreq + 0.5) / (docFreq + 0.5)) + 1 from:
1 = docFreq
3 = docCount
1.9808292 = idf for term something , computed as log(1 + (docCount -
docFreq + 0.5) / (docFreq + 0.5)) + 1 from:
1 = docFreq
3 = docCount
0.019607842 = tf, computed as freq / (freq + k1 * (1 - b + b * dl /
avgdl)) from:
1.0 = phraseFreq=1.0
50.0 = k1, term saturation parameter
0.0 = b, length normalization parameter
1.0 = dl, length of field
2.0 = avgdl, average length of field
```
### Version and environment details
lucene 9.7.0 through solr 9.3.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]