RE: Odd Boolean scoring behavior?
Turns out that I inadvertently reverted one of Simon's changes to CutoffQueryWrapper, which explains the second effect. So all is now well. Thanks for your assistance! Karl From: Wright Karl (Nokia-MS/Boston) Sent: Thursday, January 20, 2011 9:44 PM To: dev@lucene.apache.org Subject: RE: Odd Boolean scoring behavior? Found the cause of the zero querynorms, and fixed it. But the results are still not as I would expect. The first result has language=ger but scores higher than the second result which has language=eng. And yet, my query is boosting like this: Boolean OR Boolean (boost = 100.0) AND (language:eng) AND (stuff) OR (stuff) ... where (stuff) is the same stuff in both cases. Here's the scoring for two results, the first one out of language, and the second one in language: 0.018082526 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@6cdcb5eb sum of: 0.018059647 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@e2b8f23 sum of: 0.015771711 = (MATCH) weight(language:eng in 52867945), product of: 0.015771711 = queryWeight(language:eng), product of: 1.0 = idf(docFreq=23889670, maxDocs=59327671) 0.015771711 = queryNorm 1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of: 1.0 = tf(termFreq(language:eng)=0) 1.0 = idf(docFreq=23889670, maxDocs=59327671) 1.0 = fieldNorm(field=language, doc=52867945) 0.0022879362 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 product of: 0.331206 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 sum of: 0.015771711 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +(value_5:monument value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E-4))^0.667), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +(value_5:monument value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E-4))^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +(othervalue_5:monument othervalue_5:monumento^7.9949305E-4 othervalue_5:monuments^7.9949305E-4))^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+value_5:hill +(value_5:monument value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4
RE: Odd Boolean scoring behavior?
BTW: What is CutOffQueryWrapper? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: karl.wri...@nokia.com [mailto:karl.wri...@nokia.com] Sent: Friday, January 21, 2011 11:26 AM To: dev@lucene.apache.org Subject: RE: Odd Boolean scoring behavior? Turns out that I inadvertently reverted one of Simon's changes to CutoffQueryWrapper, which explains the second effect. So all is now well. Thanks for your assistance! Karl From: Wright Karl (Nokia-MS/Boston) Sent: Thursday, January 20, 2011 9:44 PM To: dev@lucene.apache.org Subject: RE: Odd Boolean scoring behavior? Found the cause of the zero querynorms, and fixed it. But the results are still not as I would expect. The first result has language=ger but scores higher than the second result which has language=eng. And yet, my query is boosting like this: Boolean OR Boolean (boost = 100.0) AND (language:eng) AND (stuff) OR (stuff) ... where (stuff) is the same stuff in both cases. Here's the scoring for two results, the first one out of language, and the second one in language: 0.018082526 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@6cdcb5eb sum of: 0.018059647 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@e2b8f23 sum of: 0.015771711 = (MATCH) weight(language:eng in 52867945), product of: 0.015771711 = queryWeight(language:eng), product of: 1.0 = idf(docFreq=23889670, maxDocs=59327671) 0.015771711 = queryNorm 1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of: 1.0 = tf(termFreq(language:eng)=0) 1.0 = idf(docFreq=23889670, maxDocs=59327671) 1.0 = fieldNorm(field=language, doc=52867945) 0.0022879362 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 product of: 0.331206 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 sum of: 0.015771711 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +(value_5:monument value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E- 4))^0.667), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +(value_5:monument value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E- 4))^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer
RE: Odd Boolean scoring behavior?
This is a query that wraps another query, which limits the number of results returned from it to some specific number. It seems very helpful for the situation where you have a lot of clauses in a query and each of them is expected to be small, but there is a chance of having one clause return lots of stuff, which by definition is less than useful. For example: OR cutoff(clause_1) cutoff(clause_2) cutoff(clause_3) ... ... where clause_N can potentially match something very common, e.g. of or road, but where you can't just treat of and road as stop words. Hope this helps. Karl -Original Message- From: ext Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 21, 2011 5:37 AM To: dev@lucene.apache.org Subject: RE: Odd Boolean scoring behavior? BTW: What is CutOffQueryWrapper? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: karl.wri...@nokia.com [mailto:karl.wri...@nokia.com] Sent: Friday, January 21, 2011 11:26 AM To: dev@lucene.apache.org Subject: RE: Odd Boolean scoring behavior? Turns out that I inadvertently reverted one of Simon's changes to CutoffQueryWrapper, which explains the second effect. So all is now well. Thanks for your assistance! Karl From: Wright Karl (Nokia-MS/Boston) Sent: Thursday, January 20, 2011 9:44 PM To: dev@lucene.apache.org Subject: RE: Odd Boolean scoring behavior? Found the cause of the zero querynorms, and fixed it. But the results are still not as I would expect. The first result has language=ger but scores higher than the second result which has language=eng. And yet, my query is boosting like this: Boolean OR Boolean (boost = 100.0) AND (language:eng) AND (stuff) OR (stuff) ... where (stuff) is the same stuff in both cases. Here's the scoring for two results, the first one out of language, and the second one in language: 0.018082526 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@6cdcb5eb sum of: 0.018059647 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@e2b8f23 sum of: 0.015771711 = (MATCH) weight(language:eng in 52867945), product of: 0.015771711 = queryWeight(language:eng), product of: 1.0 = idf(docFreq=23889670, maxDocs=59327671) 0.015771711 = queryNorm 1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of: 1.0 = tf(termFreq(language:eng)=0) 1.0 = idf(docFreq=23889670, maxDocs=59327671) 1.0 = fieldNorm(field=language, doc=52867945) 0.0022879362 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 product of: 0.331206 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 sum of: 0.015771711 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +(value_5:monument value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E- 4))^0.667), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +(value_5
Re: Odd Boolean scoring behavior?
On Thu, Jan 20, 2011 at 2:17 PM, karl.wri...@nokia.com wrote: The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any effect. I can change it all over the place, and nothing much changes. Then perhaps your language term doesn't actually match anything in the index? (i.e. how is it analyzed?) Next step would be to get score explanations (just add debugQuery=true if you're using Solr, or see IndexSearcher.explain() if not). -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Odd Boolean scoring behavior?
I tried commenting out the final OR term, and that excluded all records that were out-of-language as expected. It's just the boost that doesn't seem to work. Exploring the explain is challenging because of its size, but there are NO boosts recorded of the size I am using (10.0). Here's the basic structure of the first result. 0.0 = (MATCH) sum of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) weight(language:eng in 52867945), product of: 0.0 = queryWeight(language:eng), product of: 1.0 = idf(docFreq=23889670, maxDocs=59327671) 0.0 = queryNorm 1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of: 1.0 = tf(termFreq(language:eng)=0) 1.0 = idf(docFreq=23889670, maxDocs=59327671) 1.0 = fieldNorm(field=language, doc=52867945) 0.0 = (MATCH) product of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm 0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm ... 0.0069078947 = coord(21/3040) 0.0 = (MATCH) product of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm 0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm ... 0.0069078947 = coord(21/3040) It looks like the PRODUCT_OF and SUM_OF, which represents the Boolean logic, does not actually apply boost? Karl -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley Sent: Thursday, January 20, 2011 2:36 PM To: dev@lucene.apache.org Subject: Re: Odd Boolean scoring behavior? On Thu, Jan 20, 2011 at 2:17 PM, karl.wri...@nokia.com wrote: The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any effect. I can change it all over the place, and nothing much changes. Then perhaps your language term doesn't actually match anything in the index? (i.e. how is it analyzed?) Next step would be to get score explanations (just add debugQuery=true if you're using Solr, or see IndexSearcher.explain() if not). -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Odd Boolean scoring behavior?
On Thu, Jan 20, 2011 at 3:06 PM, karl.wri...@nokia.com wrote: I tried commenting out the final OR term, and that excluded all records that were out-of-language as expected. It's just the boost that doesn't seem to work. I see a lot of unexpected zeros - queryNorm has factors if idf and the boost in it - the fact that it's 0 suggests that you used a 0 boost. Why don't you do a toString() on your query and see if it's what you expect. -Yonik http://www.lucidimagination.com Exploring the explain is challenging because of its size, but there are NO boosts recorded of the size I am using (10.0). Here's the basic structure of the first result. 0.0 = (MATCH) sum of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) weight(language:eng in 52867945), product of: 0.0 = queryWeight(language:eng), product of: 1.0 = idf(docFreq=23889670, maxDocs=59327671) 0.0 = queryNorm 1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of: 1.0 = tf(termFreq(language:eng)=0) 1.0 = idf(docFreq=23889670, maxDocs=59327671) 1.0 = fieldNorm(field=language, doc=52867945) 0.0 = (MATCH) product of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm 0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm ... 0.0069078947 = coord(21/3040) 0.0 = (MATCH) product of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm 0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm ... 0.0069078947 = coord(21/3040) It looks like the PRODUCT_OF and SUM_OF, which represents the Boolean logic, does not actually apply boost? Karl -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley Sent: Thursday, January 20, 2011 2:36 PM To: dev@lucene.apache.org Subject: Re: Odd Boolean scoring behavior? On Thu, Jan 20, 2011 at 2:17 PM, karl.wri...@nokia.com wrote: The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any effect. I can change it all over the place, and nothing much changes. Then perhaps your language term doesn't actually match anything in the index? (i.e. how is it analyzed?) Next step would be to get score explanations (just add debugQuery=true if you're using Solr, or see IndexSearcher.explain() if not). -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Odd Boolean scoring behavior?
The original query is fine, and has the boost as expected: ((+language:eng +( CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 +value_0:hill)^0.5714286) CutoffQueryWrapper((+value_0:bunker~0.8332333 +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667) CutoffQueryWrapper((+othervalue_1:bunker~0.8332333 +value_0:hill)^0.5714286) CutoffQueryWrapper((+value_1:bunker~0.8332333 +othervalue_0:hill)^0.5714286) ... CutoffQueryWrapper((+othervalue_7:bunker~0.8332333 +value_7:hillmonument~0.8332333)^0.85714287) CutoffQueryWrapper((+value_7:bunker~0.8332333 +othervalue_7:hillmonument~0.8332333)^0.85714287)))^3.0) ( CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 +value_0:hill)^0.5714286) CutoffQueryWrapper((+value_0:bunker~0.8332333 +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667) ... )) The rewritten query is odd. Here's a sample: ((+language:eng +( CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) ... CutoffQueryWrapper((+() +(()^0.556))^0.85714287)))^3.0) ( CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) ... CutoffQueryWrapper((+() +(()^0.556))^0.85714287) CutoffQueryWrapper(+() +(()^0.667)) CutoffQueryWrapper((+() +(()^0.667))^0.85714287) CutoffQueryWrapper((+() +(()^0.556))^0.85714287) ) As you can see, there are a lot of repeats, a lot of blank matches, but the original boost *is* still there. I really can't interpret this any further - the many blank and repeated matches seem wrong to me, but the scorer explanation seems even more wrong. Any ideas? Karl -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley Sent: Thursday, January 20, 2011 3:34 PM To: dev@lucene.apache.org Subject: Re: Odd Boolean scoring behavior? On Thu, Jan 20, 2011 at 3:06 PM, karl.wri...@nokia.com wrote: I tried commenting out the final OR term, and that excluded all records that were out-of-language as expected. It's just the boost that doesn't seem to work. I see a lot of unexpected zeros - queryNorm has factors if idf and the boost in it - the fact that it's 0 suggests that you used a 0 boost. Why don't you do a toString() on your query and see if it's what you expect. -Yonik http://www.lucidimagination.com Exploring the explain is challenging because of its size, but there are NO boosts recorded of the size I am using (10.0). Here's the basic structure of the first result. 0.0 = (MATCH) sum of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) weight(language:eng in 52867945), product of: 0.0 = queryWeight(language:eng), product of: 1.0 = idf(docFreq=23889670, maxDocs=59327671) 0.0 = queryNorm 1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of: 1.0 = tf(termFreq(language:eng)=0) 1.0 = idf(docFreq=23889670, maxDocs=59327671) 1.0 = fieldNorm(field=language, doc=52867945) 0.0 = (MATCH) product of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm 0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4
RE: Odd Boolean scoring behavior?
(stuff) clauses is now consistent with the boost. What I don't understand is why the first score winds up being lower than the second despite the boost. Maybe it's just late and I don't see it, but it looks to me like for scoring the language clause is operating essentially independently of the Boolean AND clause, so my query construct just isn't going to do what I hoped. Thoughts? If I'm right, how *could* I do what I'm trying to do? Karl -Original Message- From: Wright Karl (Nokia-MS/Boston) Sent: Thursday, January 20, 2011 7:49 PM To: dev@lucene.apache.org; yo...@lucidimagination.com Subject: RE: Odd Boolean scoring behavior? So I think I understand where the blank values and repeats come from. Those are the expansions of fuzzy queries against fields that have no matches whatsoever for the fuzzy values in question. So those are indeed OK. I guess then that the problem is that the scoring explanation makes no sense. I'm going to pick that apart and see why not next. Karl -Original Message- From: ext karl.wri...@nokia.com [mailto:karl.wri...@nokia.com] Sent: Thursday, January 20, 2011 5:31 PM To: dev@lucene.apache.org; yo...@lucidimagination.com Subject: RE: Odd Boolean scoring behavior? The original query is fine, and has the boost as expected: ((+language:eng +( CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 +value_0:hill)^0.5714286) CutoffQueryWrapper((+value_0:bunker~0.8332333 +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667) CutoffQueryWrapper((+othervalue_1:bunker~0.8332333 +value_0:hill)^0.5714286) CutoffQueryWrapper((+value_1:bunker~0.8332333 +othervalue_0:hill)^0.5714286) ... CutoffQueryWrapper((+othervalue_7:bunker~0.8332333 +value_7:hillmonument~0.8332333)^0.85714287) CutoffQueryWrapper((+value_7:bunker~0.8332333 +othervalue_7:hillmonument~0.8332333)^0.85714287)))^3.0) ( CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 +value_0:hill)^0.5714286) CutoffQueryWrapper((+value_0:bunker~0.8332333 +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667) ... )) The rewritten query is odd. Here's a sample: ((+language:eng +( CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) ... CutoffQueryWrapper((+() +(()^0.556))^0.85714287)))^3.0) ( CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) ... CutoffQueryWrapper((+() +(()^0.556))^0.85714287) CutoffQueryWrapper(+() +(()^0.667)) CutoffQueryWrapper((+() +(()^0.667))^0.85714287) CutoffQueryWrapper((+() +(()^0.556))^0.85714287) ) As you can see, there are a lot of repeats, a lot of blank matches, but the original boost *is* still there. I really can't interpret this any further - the many blank and repeated matches seem wrong to me, but the scorer explanation seems even more wrong. Any ideas? Karl -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley Sent: Thursday, January 20, 2011 3:34 PM To: dev@lucene.apache.org Subject: Re: Odd Boolean scoring behavior? On Thu, Jan 20, 2011 at 3:06 PM, karl.wri...@nokia.com wrote: I tried commenting out the final OR term, and that excluded all records that were out-of-language as expected. It's just the boost that doesn't seem to work. I see a lot of unexpected zeros - queryNorm has factors if idf and the boost in it - the fact that it's 0 suggests that you used a 0 boost. Why don't you do a toString() on your query and see if it's what you expect. -Yonik http://www.lucidimagination.com Exploring the explain is challenging because of its size, but there are NO boosts recorded of the size I am using (10.0). Here's the basic structure of the first result. 0.0 = (MATCH) sum of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) weight(language:eng in 52867945), product of: 0.0 = queryWeight(language:eng), product