subject:"RE\: Odd Boolean scoring behavior\?"

RE: Odd Boolean scoring behavior?

2011-01-21 Thread karl.wright

Turns out that I inadvertently reverted one of Simon's changes to 
CutoffQueryWrapper, which explains the second effect.  So all is now well.

Thanks for your assistance!
Karl


From: Wright Karl (Nokia-MS/Boston)
Sent: Thursday, January 20, 2011 9:44 PM
To: dev@lucene.apache.org
Subject: RE: Odd Boolean scoring behavior?

Found the cause of the zero querynorms, and fixed it.  But the results are 
still not as I would expect.  The first result has language=ger but scores 
higher than the second result which has language=eng.  And yet, my query is 
boosting like this:

Boolean
 OR Boolean (boost = 100.0)
  AND (language:eng)
  AND (stuff)
 OR (stuff)

... where (stuff) is the same stuff in both cases.

Here's the scoring for two results, the first one out of language, and the 
second one in language:

0.018082526 = (MATCH) 
org.apache.lucene.search.BooleanQuery$BooleanWeight@6cdcb5eb sum of:
  0.018059647 = (MATCH) 
org.apache.lucene.search.BooleanQuery$BooleanWeight@e2b8f23 sum of:
0.015771711 = (MATCH) weight(language:eng in 52867945), product of:
  0.015771711 = queryWeight(language:eng), product of:
1.0 = idf(docFreq=23889670, maxDocs=59327671)
0.015771711 = queryNorm
  1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
1.0 = tf(termFreq(language:eng)=0)
1.0 = idf(docFreq=23889670, maxDocs=59327671)
1.0 = fieldNorm(field=language, doc=52867945)
0.0022879362 = (MATCH) 
org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 product of:
  0.331206 = (MATCH) 
org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 sum of:
0.015771711 = (MATCH) 
CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
value_5:busker^5.997396E-4) +(value_5:monument value_5:monumenta^7.9949305E-4 
value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 
value_5:monuments^7.9949305E-4))^0.667), product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) 
CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
othervalue_5:burker^5.997396E-4) +(value_5:monument 
value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 
value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E-4))^0.5714286), 
product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
value_5:busker^5.997396E-4) +(othervalue_5:monument 
othervalue_5:monumento^7.9949305E-4 
othervalue_5:monuments^7.9949305E-4))^0.5714286), product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) CutoffQueryWrapper((+value_5:hill 
+(value_5:monument value_5:monumenta^7.9949305E-4 
value_5:monumentc^7.9949305E-4

RE: Odd Boolean scoring behavior?

2011-01-21 Thread Uwe Schindler

BTW: What is CutOffQueryWrapper?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: karl.wri...@nokia.com [mailto:karl.wri...@nokia.com]
 Sent: Friday, January 21, 2011 11:26 AM
 To: dev@lucene.apache.org
 Subject: RE: Odd Boolean scoring behavior?
 
 Turns out that I inadvertently reverted one of Simon's changes to
 CutoffQueryWrapper, which explains the second effect.  So all is now well.
 
 Thanks for your assistance!
 Karl
 
 
 From: Wright Karl (Nokia-MS/Boston)
 Sent: Thursday, January 20, 2011 9:44 PM
 To: dev@lucene.apache.org
 Subject: RE: Odd Boolean scoring behavior?
 
 Found the cause of the zero querynorms, and fixed it.  But the results are
still
 not as I would expect.  The first result has language=ger but scores
higher
 than the second result which has language=eng.  And yet, my query is
 boosting like this:
 
 Boolean
  OR Boolean (boost = 100.0)
   AND (language:eng)
   AND (stuff)
  OR (stuff)
 
 ... where (stuff) is the same stuff in both cases.
 
 Here's the scoring for two results, the first one out of language, and the
 second one in language:
 
 0.018082526 = (MATCH)
 org.apache.lucene.search.BooleanQuery$BooleanWeight@6cdcb5eb sum of:
   0.018059647 = (MATCH)
 org.apache.lucene.search.BooleanQuery$BooleanWeight@e2b8f23 sum of:
 0.015771711 = (MATCH) weight(language:eng in 52867945), product of:
   0.015771711 = queryWeight(language:eng), product of:
 1.0 = idf(docFreq=23889670, maxDocs=59327671)
 0.015771711 = queryNorm
   1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
 1.0 = tf(termFreq(language:eng)=0)
 1.0 = idf(docFreq=23889670, maxDocs=59327671)
 1.0 = fieldNorm(field=language, doc=52867945)
 0.0022879362 = (MATCH)
 org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 product
 of:
   0.331206 = (MATCH)
 org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 sum of:
 0.015771711 = (MATCH)
 CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4
 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4
 othervalue_5:bunker othervalue_5:bunner^5.997396E-4
 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
   1.0 = boost
   0.015771711 = queryNorm
 0.015771711 = (MATCH)
 CutoffQueryWrapper((+(value_5:banker^5.997396E-4
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4
 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4
 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
   1.0 = boost
   0.015771711 = queryNorm
 0.015771711 = (MATCH)
 CutoffQueryWrapper((+(value_5:banker^5.997396E-4
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4
 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4
 value_5:busker^5.997396E-4) +(value_5:monument
 value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4
 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E-
 4))^0.667), product of:
   1.0 = boost
   0.015771711 = queryNorm
 0.015771711 = (MATCH)
 CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4
 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4
 othervalue_5:bunker othervalue_5:bunner^5.997396E-4
 othervalue_5:burker^5.997396E-4) +(value_5:monument
 value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4
 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E-
 4))^0.5714286), product of:
   1.0 = boost
   0.015771711 = queryNorm
 0.015771711 = (MATCH)
 CutoffQueryWrapper((+(value_5:banker^5.997396E-4
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4
 value_5:bunzer

RE: Odd Boolean scoring behavior?

2011-01-21 Thread karl.wright

This is a query that wraps another query, which limits the number of results 
returned from it to some specific number.  It seems very helpful for the 
situation where you have a lot of clauses in a query and each of them is 
expected to be small, but there is a chance of having one clause return lots of 
stuff, which by definition is less than useful.  For example:

OR
 cutoff(clause_1)
 cutoff(clause_2)
 cutoff(clause_3)
...

... where clause_N can potentially match something very common, e.g. of or 
road, but where you can't just treat of and road as stop words.

Hope this helps.
Karl



-Original Message-
From: ext Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Friday, January 21, 2011 5:37 AM
To: dev@lucene.apache.org
Subject: RE: Odd Boolean scoring behavior?

BTW: What is CutOffQueryWrapper?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: karl.wri...@nokia.com [mailto:karl.wri...@nokia.com]
 Sent: Friday, January 21, 2011 11:26 AM
 To: dev@lucene.apache.org
 Subject: RE: Odd Boolean scoring behavior?

 Turns out that I inadvertently reverted one of Simon's changes to
 CutoffQueryWrapper, which explains the second effect.  So all is now well.

 Thanks for your assistance!
 Karl

 
 From: Wright Karl (Nokia-MS/Boston)
 Sent: Thursday, January 20, 2011 9:44 PM
 To: dev@lucene.apache.org
 Subject: RE: Odd Boolean scoring behavior?

 Found the cause of the zero querynorms, and fixed it.  But the results are
still
 not as I would expect.  The first result has language=ger but scores
higher
 than the second result which has language=eng.  And yet, my query is
 boosting like this:

 Boolean
  OR Boolean (boost = 100.0)
   AND (language:eng)
   AND (stuff)
  OR (stuff)

 ... where (stuff) is the same stuff in both cases.

 Here's the scoring for two results, the first one out of language, and the
 second one in language:

 0.018082526 = (MATCH)
 org.apache.lucene.search.BooleanQuery$BooleanWeight@6cdcb5eb sum of:
   0.018059647 = (MATCH)
 org.apache.lucene.search.BooleanQuery$BooleanWeight@e2b8f23 sum of:
 0.015771711 = (MATCH) weight(language:eng in 52867945), product of:
   0.015771711 = queryWeight(language:eng), product of:
 1.0 = idf(docFreq=23889670, maxDocs=59327671)
 0.015771711 = queryNorm
   1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
 1.0 = tf(termFreq(language:eng)=0)
 1.0 = idf(docFreq=23889670, maxDocs=59327671)
 1.0 = fieldNorm(field=language, doc=52867945)
 0.0022879362 = (MATCH)
 org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 product
 of:
   0.331206 = (MATCH)
 org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 sum of:
 0.015771711 = (MATCH)
 CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4
 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4
 othervalue_5:bunker othervalue_5:bunner^5.997396E-4
 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
   1.0 = boost
   0.015771711 = queryNorm
 0.015771711 = (MATCH)
 CutoffQueryWrapper((+(value_5:banker^5.997396E-4
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4
 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4
 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
   1.0 = boost
   0.015771711 = queryNorm
 0.015771711 = (MATCH)
 CutoffQueryWrapper((+(value_5:banker^5.997396E-4
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4
 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4
 value_5:busker^5.997396E-4) +(value_5:monument
 value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4
 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E-
 4))^0.667), product of:
   1.0 = boost
   0.015771711 = queryNorm
 0.015771711 = (MATCH)
 CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4
 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4
 othervalue_5:bunker othervalue_5:bunner^5.997396E-4
 othervalue_5:burker^5.997396E-4) +(value_5

Re: Odd Boolean scoring behavior?

2011-01-20 Thread Yonik Seeley

On Thu, Jan 20, 2011 at 2:17 PM,  karl.wri...@nokia.com wrote:
 The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any
 effect.  I can change it all over the place, and nothing much changes.

Then perhaps your language term doesn't actually match anything in the
index?  (i.e. how is it analyzed?)
Next step would be to get score explanations (just add debugQuery=true
if you're using Solr, or see IndexSearcher.explain() if not).

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Odd Boolean scoring behavior?

2011-01-20 Thread karl.wright

I tried commenting out the final OR term, and that excluded all records that 
were out-of-language as expected.  It's just the boost that doesn't seem to 
work.

Exploring the explain is challenging because of its size, but there are NO 
boosts recorded of the size I am using (10.0).  Here's the basic structure of 
the first result.

0.0 = (MATCH) sum of:
  0.0 = (MATCH) sum of:
0.0 = (MATCH) weight(language:eng in 52867945), product of:
  0.0 = queryWeight(language:eng), product of:
1.0 = idf(docFreq=23889670, maxDocs=59327671)
0.0 = queryNorm
  1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
1.0 = tf(termFreq(language:eng)=0)
1.0 = idf(docFreq=23889670, maxDocs=59327671)
1.0 = fieldNorm(field=language, doc=52867945)
0.0 = (MATCH) product of:
  0.0 = (MATCH) sum of:
0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
  1.0 = boost
  0.0 = queryNorm
0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
  1.0 = boost
  0.0 = queryNorm

...

  0.0069078947 = coord(21/3040)
  0.0 = (MATCH) product of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
1.0 = boost
0.0 = queryNorm
  0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
1.0 = boost
0.0 = queryNorm

...

0.0069078947 = coord(21/3040)

It looks like the PRODUCT_OF and SUM_OF, which represents the Boolean logic, 
does not actually apply boost?

Karl



-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley
Sent: Thursday, January 20, 2011 2:36 PM
To: dev@lucene.apache.org
Subject: Re: Odd Boolean scoring behavior?

On Thu, Jan 20, 2011 at 2:17 PM,  karl.wri...@nokia.com wrote:
 The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any
 effect.  I can change it all over the place, and nothing much changes.

Then perhaps your language term doesn't actually match anything in the
index?  (i.e. how is it analyzed?)
Next step would be to get score explanations (just add debugQuery=true
if you're using Solr, or see IndexSearcher.explain() if not).

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Odd Boolean scoring behavior?

2011-01-20 Thread Yonik Seeley

On Thu, Jan 20, 2011 at 3:06 PM,  karl.wri...@nokia.com wrote:
 I tried commenting out the final OR term, and that excluded all records that 
 were out-of-language as expected.  It's just the boost that doesn't seem to 
 work.

I see a lot of unexpected zeros - queryNorm has factors if idf and the
boost in it - the fact that it's 0 suggests that you used a 0 boost.

Why don't you do a toString() on your query and see if it's what you expect.

-Yonik
http://www.lucidimagination.com



 Exploring the explain is challenging because of its size, but there are NO 
 boosts recorded of the size I am using (10.0).  Here's the basic structure of 
 the first result.

 0.0 = (MATCH) sum of:
  0.0 = (MATCH) sum of:
    0.0 = (MATCH) weight(language:eng in 52867945), product of:
      0.0 = queryWeight(language:eng), product of:
        1.0 = idf(docFreq=23889670, maxDocs=59327671)
        0.0 = queryNorm
      1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
        1.0 = tf(termFreq(language:eng)=0)
        1.0 = idf(docFreq=23889670, maxDocs=59327671)
        1.0 = fieldNorm(field=language, doc=52867945)
    0.0 = (MATCH) product of:
      0.0 = (MATCH) sum of:
        0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
          1.0 = boost
          0.0 = queryNorm
        0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
          1.0 = boost
          0.0 = queryNorm

 ...

      0.0069078947 = coord(21/3040)
  0.0 = (MATCH) product of:
    0.0 = (MATCH) sum of:
      0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
        1.0 = boost
        0.0 = queryNorm
      0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
        1.0 = boost
        0.0 = queryNorm

 ...

    0.0069078947 = coord(21/3040)

 It looks like the PRODUCT_OF and SUM_OF, which represents the Boolean logic, 
 does not actually apply boost?

 Karl



 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik 
 Seeley
 Sent: Thursday, January 20, 2011 2:36 PM
 To: dev@lucene.apache.org
 Subject: Re: Odd Boolean scoring behavior?

 On Thu, Jan 20, 2011 at 2:17 PM,  karl.wri...@nokia.com wrote:
 The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any
 effect.  I can change it all over the place, and nothing much changes.

 Then perhaps your language term doesn't actually match anything in the
 index?  (i.e. how is it analyzed?)
 Next step would be to get score explanations (just add debugQuery=true
 if you're using Solr, or see IndexSearcher.explain() if not).

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Odd Boolean scoring behavior?

2011-01-20 Thread karl.wright

The original query is fine, and has the boost as expected:

((+language:eng +(
CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) 
CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 
+value_0:hill)^0.5714286) 
CutoffQueryWrapper((+value_0:bunker~0.8332333 
+othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667) 
CutoffQueryWrapper((+othervalue_1:bunker~0.8332333 
+value_0:hill)^0.5714286) 
CutoffQueryWrapper((+value_1:bunker~0.8332333 +othervalue_0:hill)^0.5714286)
...
CutoffQueryWrapper((+othervalue_7:bunker~0.8332333 
+value_7:hillmonument~0.8332333)^0.85714287) 
CutoffQueryWrapper((+value_7:bunker~0.8332333 
+othervalue_7:hillmonument~0.8332333)^0.85714287)))^3.0)
(
CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) 
CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 
+value_0:hill)^0.5714286) 
CutoffQueryWrapper((+value_0:bunker~0.8332333 
+othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667)
...
))

The rewritten query is odd.  Here's a sample:


((+language:eng +(
CutoffQueryWrapper((+() +value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) 
+value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286)

...

CutoffQueryWrapper((+() +(()^0.556))^0.85714287)))^3.0)
(
CutoffQueryWrapper((+() +value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) 
+value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286)
...
CutoffQueryWrapper((+() +(()^0.556))^0.85714287) 
CutoffQueryWrapper(+() +(()^0.667)) 
CutoffQueryWrapper((+() +(()^0.667))^0.85714287) 
CutoffQueryWrapper((+() +(()^0.556))^0.85714287)
)

As you can see, there are a lot of repeats, a lot of blank matches, but the 
original boost *is* still there.  I really can't interpret this any further - 
the many blank and repeated matches seem wrong to me, but the scorer 
explanation seems even more wrong.  Any ideas?

Karl


-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley
Sent: Thursday, January 20, 2011 3:34 PM
To: dev@lucene.apache.org
Subject: Re: Odd Boolean scoring behavior?

On Thu, Jan 20, 2011 at 3:06 PM,  karl.wri...@nokia.com wrote:
 I tried commenting out the final OR term, and that excluded all records that 
 were out-of-language as expected.  It's just the boost that doesn't seem to 
 work.

I see a lot of unexpected zeros - queryNorm has factors if idf and the
boost in it - the fact that it's 0 suggests that you used a 0 boost.

Why don't you do a toString() on your query and see if it's what you expect.

-Yonik
http://www.lucidimagination.com



 Exploring the explain is challenging because of its size, but there are NO 
 boosts recorded of the size I am using (10.0).  Here's the basic structure of 
 the first result.

 0.0 = (MATCH) sum of:
  0.0 = (MATCH) sum of:
    0.0 = (MATCH) weight(language:eng in 52867945), product of:
      0.0 = queryWeight(language:eng), product of:
        1.0 = idf(docFreq=23889670, maxDocs=59327671)
        0.0 = queryNorm
      1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
        1.0 = tf(termFreq(language:eng)=0)
        1.0 = idf(docFreq=23889670, maxDocs=59327671)
        1.0 = fieldNorm(field=language, doc=52867945)
    0.0 = (MATCH) product of:
      0.0 = (MATCH) sum of:
        0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
          1.0 = boost
          0.0 = queryNorm
        0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4

RE: Odd Boolean scoring behavior?

2011-01-20 Thread karl.wright

 (stuff) clauses is now consistent with 
the boost.  What I don't understand is why the first score winds up being lower 
than the second despite the boost.  Maybe it's just late and I don't see it, 
but it looks to me like for scoring the language clause is operating 
essentially independently of the Boolean AND clause, so my query construct just 
isn't going to do what I hoped.

Thoughts?  If I'm right, how *could* I do what I'm trying to do?
Karl


-Original Message-
From: Wright Karl (Nokia-MS/Boston)
Sent: Thursday, January 20, 2011 7:49 PM
To: dev@lucene.apache.org; yo...@lucidimagination.com
Subject: RE: Odd Boolean scoring behavior?

So I think I understand where the blank values and repeats come from.  Those 
are the expansions of fuzzy queries against fields that have no matches 
whatsoever for the fuzzy values in question. So those are indeed OK.

I guess then that the problem is that the scoring explanation makes no sense.  
I'm going to pick that apart and see why not next.

Karl

-Original Message-
From: ext karl.wri...@nokia.com [mailto:karl.wri...@nokia.com]
Sent: Thursday, January 20, 2011 5:31 PM
To: dev@lucene.apache.org; yo...@lucidimagination.com
Subject: RE: Odd Boolean scoring behavior?

The original query is fine, and has the boost as expected:

((+language:eng +(
CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667)
CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 +value_0:hill)^0.5714286)
CutoffQueryWrapper((+value_0:bunker~0.8332333 +othervalue_0:hill)^0.5714286)
CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667)
CutoffQueryWrapper((+othervalue_1:bunker~0.8332333 +value_0:hill)^0.5714286)
CutoffQueryWrapper((+value_1:bunker~0.8332333 +othervalue_0:hill)^0.5714286)
...
CutoffQueryWrapper((+othervalue_7:bunker~0.8332333 
+value_7:hillmonument~0.8332333)^0.85714287)
CutoffQueryWrapper((+value_7:bunker~0.8332333 
+othervalue_7:hillmonument~0.8332333)^0.85714287)))^3.0)
(
CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667)
CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 +value_0:hill)^0.5714286)
CutoffQueryWrapper((+value_0:bunker~0.8332333 +othervalue_0:hill)^0.5714286)
CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667)
...
))

The rewritten query is odd.  Here's a sample:


((+language:eng +(
CutoffQueryWrapper((+() +value_0:hill)^0.667)
CutoffQueryWrapper((+() +value_0:hill)^0.5714286)
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286)
CutoffQueryWrapper((+() +value_0:hill)^0.667)
CutoffQueryWrapper((+() +value_0:hill)^0.5714286)
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286)
CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) 
+value_0:hill)^0.667)
CutoffQueryWrapper((+() +value_0:hill)^0.5714286)

...

CutoffQueryWrapper((+() +(()^0.556))^0.85714287)))^3.0)
(
CutoffQueryWrapper((+() +value_0:hill)^0.667)
CutoffQueryWrapper((+() +value_0:hill)^0.5714286)
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286)
CutoffQueryWrapper((+() +value_0:hill)^0.667)
CutoffQueryWrapper((+() +value_0:hill)^0.5714286)
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286)
CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) 
+value_0:hill)^0.667)
CutoffQueryWrapper((+() +value_0:hill)^0.5714286)
...
CutoffQueryWrapper((+() +(()^0.556))^0.85714287)
CutoffQueryWrapper(+() +(()^0.667))
CutoffQueryWrapper((+() +(()^0.667))^0.85714287)
CutoffQueryWrapper((+() +(()^0.556))^0.85714287)
)

As you can see, there are a lot of repeats, a lot of blank matches, but the 
original boost *is* still there.  I really can't interpret this any further - 
the many blank and repeated matches seem wrong to me, but the scorer 
explanation seems even more wrong.  Any ideas?

Karl


-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley
Sent: Thursday, January 20, 2011 3:34 PM
To: dev@lucene.apache.org
Subject: Re: Odd Boolean scoring behavior?

On Thu, Jan 20, 2011 at 3:06 PM,  karl.wri...@nokia.com wrote:
 I tried commenting out the final OR term, and that excluded all records that 
 were out-of-language as expected.  It's just the boost that doesn't seem to 
 work.

I see a lot of unexpected zeros - queryNorm has factors if idf and the
boost in it - the fact that it's 0 suggests that you used a 0 boost.

Why don't you do a toString() on your query and see if it's what you expect.

-Yonik
http://www.lucidimagination.com



 Exploring the explain is challenging because of its size, but there are NO 
 boosts recorded of the size I am using (10.0).  Here's the basic structure of 
 the first result.

 0.0 = (MATCH) sum of:
  0.0 = (MATCH) sum of:
0.0 = (MATCH) weight(language:eng in 52867945), product of:
  0.0 = queryWeight(language:eng), product

RE: Odd Boolean scoring behavior?

RE: Odd Boolean scoring behavior?

RE: Odd Boolean scoring behavior?

Re: Odd Boolean scoring behavior?

RE: Odd Boolean scoring behavior?

Re: Odd Boolean scoring behavior?

RE: Odd Boolean scoring behavior?

RE: Odd Boolean scoring behavior?

8 matches

Site Navigation

Mail list logo

Footer information