Re: BlendedTermQuery causing negative IDF?

Ahmet Arslan Tue, 19 Apr 2016 07:17:37 -0700


Hi Markus,


It is a known property of BM25. It produces negative scores for common terms.
Most of the term-weighting models are developed for indices in which stop words 
are eliminated.
Therefore, most of the term-weighting models have problems scoring common terms.
By the way, DFI model does a decent job when handling common terms.

Ahmet



On Tuesday, April 19, 2016 4:48 PM, Markus Jelsma <[email protected]> 
wrote:
Hello,

I just made a Solr query parser for BlendedTermQuery on Lucene 6.0 using BM25 
similarity and i have a very simple unit test to see if something is working at 
all. But to my surprise, one of the results has a negative score, caused by a 
negative IDF because docFreq is higher than docCount for that term on that 
field. Here are the test documents:

    assertU(adoc("id", "1", "text", "rare term"));
    assertU(adoc("id", "2", "text_nl", "less rare term"));
    assertU(adoc("id", "3", "text_nl", "rarest term"));
    assertU(commit());

My query parser creates the following Lucene query: 
BlendedTermQuery(Blended(text:rare text:term text_nl:rare text_nl:term)) which 
looks fine to me. But this is what i am getting back for issueing that query on 
the above set of documents, the third document is the one with a negative score.

<result name="response" numFound="3" start="0" maxScore="0.1805489">
  <doc>
    <str name="id">3</str>
    <float name="score">0.1805489</float></doc>
  <doc>
    <str name="id">2</str>
    <float name="score">0.14785346</float></doc>
  <doc>
    <str name="id">1</str>
    <float name="score">-0.004004207</float></doc>
</result>
<lst name="debug">
  <str name="rawquerystring">{!blended fl=text,text_nl}rare term</str>
  <str name="querystring">{!blended fl=text,text_nl}rare term</str>
  <str name="parsedquery">BlendedTermQuery(Blended(text:rare text:term 
text_nl:rare text_nl:term))</str>
  <str name="parsedquery_toString">Blended(text:rare text:term text_nl:rare 
text_nl:term)</str>
  <lst name="explain">
    <str name="3">
0.1805489 = max plus 0.01 times others of:
  0.1805489 = weight(text_nl:term in 2) [], result of:
    0.1805489 = score(doc=2,freq=1.0 = termFreq=1.0
), product of:
      0.18232156 = idf(docFreq=2, docCount=2)
      0.9902773 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        2.5 = avgFieldLength
        2.56 = fieldLength
</str>
    <str name="2">
0.14785345 = max plus 0.01 times others of:
  0.14638956 = weight(text_nl:rare in 1) [], result of:
    0.14638956 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
      0.18232156 = idf(docFreq=2, docCount=2)
      0.8029196 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        2.5 = avgFieldLength
        4.0 = fieldLength
  0.14638956 = weight(text_nl:term in 1) [], result of:
    0.14638956 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
      0.18232156 = idf(docFreq=2, docCount=2)
      0.8029196 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        2.5 = avgFieldLength
        4.0 = fieldLength
</str>
    <str name="1">
-0.004004207 = max plus 0.01 times others of:
  -0.20021036 = weight(text:rare in 0) [], result of:
    -0.20021036 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
      -0.22314355 = idf(docFreq=2, docCount=1)
      0.89722675 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        2.0 = avgFieldLength
        2.56 = fieldLength
  -0.20021036 = weight(text:term in 0) [], result of:
    -0.20021036 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
      -0.22314355 = idf(docFreq=2, docCount=1)
      0.89722675 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        2.0 = avgFieldLength
        2.56 = fieldLength
</str>

What am i doing wrong? Or did i catch a bug?

Thanks,
Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: BlendedTermQuery causing negative IDF?

Reply via email to