Hello,
I just made a Solr query parser for BlendedTermQuery on Lucene 6.0 using BM25
similarity and i have a very simple unit test to see if something is working at
all. But to my surprise, one of the results has a negative score, caused by a
negative IDF because docFreq is higher than docCount for that term on that
field. Here are the test documents:
assertU(adoc("id", "1", "text", "rare term"));
assertU(adoc("id", "2", "text_nl", "less rare term"));
assertU(adoc("id", "3", "text_nl", "rarest term"));
assertU(commit());
My query parser creates the following Lucene query:
BlendedTermQuery(Blended(text:rare text:term text_nl:rare text_nl:term)) which
looks fine to me. But this is what i am getting back for issueing that query on
the above set of documents, the third document is the one with a negative score.
<result name="response" numFound="3" start="0" maxScore="0.1805489">
<doc>
<str name="id">3</str>
<float name="score">0.1805489</float></doc>
<doc>
<str name="id">2</str>
<float name="score">0.14785346</float></doc>
<doc>
<str name="id">1</str>
<float name="score">-0.004004207</float></doc>
</result>
<lst name="debug">
<str name="rawquerystring">{!blended fl=text,text_nl}rare term</str>
<str name="querystring">{!blended fl=text,text_nl}rare term</str>
<str name="parsedquery">BlendedTermQuery(Blended(text:rare text:term
text_nl:rare text_nl:term))</str>
<str name="parsedquery_toString">Blended(text:rare text:term text_nl:rare
text_nl:term)</str>
<lst name="explain">
<str name="3">
0.1805489 = max plus 0.01 times others of:
0.1805489 = weight(text_nl:term in 2) [], result of:
0.1805489 = score(doc=2,freq=1.0 = termFreq=1.0
), product of:
0.18232156 = idf(docFreq=2, docCount=2)
0.9902773 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
2.5 = avgFieldLength
2.56 = fieldLength
</str>
<str name="2">
0.14785345 = max plus 0.01 times others of:
0.14638956 = weight(text_nl:rare in 1) [], result of:
0.14638956 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
0.18232156 = idf(docFreq=2, docCount=2)
0.8029196 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
2.5 = avgFieldLength
4.0 = fieldLength
0.14638956 = weight(text_nl:term in 1) [], result of:
0.14638956 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
0.18232156 = idf(docFreq=2, docCount=2)
0.8029196 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
2.5 = avgFieldLength
4.0 = fieldLength
</str>
<str name="1">
-0.004004207 = max plus 0.01 times others of:
-0.20021036 = weight(text:rare in 0) [], result of:
-0.20021036 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
-0.22314355 = idf(docFreq=2, docCount=1)
0.89722675 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
2.0 = avgFieldLength
2.56 = fieldLength
-0.20021036 = weight(text:term in 0) [], result of:
-0.20021036 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
-0.22314355 = idf(docFreq=2, docCount=1)
0.89722675 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
2.0 = avgFieldLength
2.56 = fieldLength
</str>
What am i doing wrong? Or did i catch a bug?
Thanks,
Markus
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]