Re: product based term combination for BooleanQuery?

Grant Ingersoll Tue, 03 Jul 2007 13:28:10 -0700

When you do an explain on these results, what are all the factorsthat contribute to the score?

Could you increase the coord() factor in a custom Similarityimplementation, to give a bigger boost to documents that have morematching terms? The point of coord is to give a little bump to thosedocs that have more terms from the query in a given document. Soundslike you want a bigger bump once you have multiple query terms in adocument. Would this work for you?


Also, below...

On Jul 3, 2007, at 3:20 PM, Tim Sturge wrote:

That's true, but it's not clear that I want phrase matches.Consider for example:
"Lucene Download" as a query. I want something that stronglyreferences "Lucene" (in the title) and strongly references"Download" but "Download Lucene" or "Lucene Project Download" arebetter than some page that happens to contain the exact phrase.

Not sure I follow you here. By strongly references, do you meanthere are multiple occurrences of Download? Why would thosealternatives be better than an exact phrase match?

Other examples are "camera review" or "Gonzales scandal"; there's awhole class of "subject <modifier>" queries that are not reallyphrase based, and my corpus isn't large enough to necessarilycontain the phrase anyway.
I agree that many two or three word queries are really best matchedby phrases, but not all. Is it common to use a phrase query withhigh slop to overcome the unequal weighting problem?
Also, my interface does support "\"John Bush\"" (ie the user canquote the phrase if they like) and I would prefer not to inferautomatically that they meant to do so.
Tim

Jason Pump wrote:
You're not using any type of phrase search. Try ->
( (title:"John Bush"^4.0) OR (body:"John Bush") ) AND( (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush) )
or maybe
( (title:"John Bush"~4^4.0) OR (body:"John Bush"~4) ) AND( (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush) )
Tim Sturge wrote:
I'm following myself up here to ask if anyone has experience orcode with a BooleanQuery that weights the terms it encounters ona product basis rather than a sum basis.
This would effectively compute the geometric mean of the termscore (rather than the arithmetic mean) and would give me more"middle bias". It also has the great advantage that itautomatically implements AND (as something without the term has ascore of 0.0 which causes the query to go to 0.0 as well.)
I'm curious though why this doesn't already exist. Is it a badidea in general (that I will discover once I implement it andlook at the results?) or does it make searching a lot slower?
Thanks,

Tim

Tim Sturge wrote:
I have an index with two different sources of information, onesmall but of high quality (call it "title"), and one large, butof lower quality (call it "body"). I give boosts to certaindocuments related to their popularity (this is very similar towhat one would do indexing the web).
The problem I have is a query like "John Bush". I translate thatinto " (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush)". But the results I get are:
1. George Bush
...
4. John Kerry
...
10. John Bush

The reason is (looking at explain) that George Bush is scored:
169 = sum(
1 =  <match in body with tiny norm for "John">
)
168 = sum(
    160 = <title match for "Bush">
    8 = <body match for "Bush">
)
)
and John Kerry is similar but reversed. Poor old "John Bush"only scores:
72 = sum(
 40 = (<title match for "John">+<body match>)
 32 = (<title match for "Bush">+ <body match>)
)

because his initial boost was only 1/4 of George's.
The question I have is, how can tell the searcher to care about"balance"? I really want the score over 2 terms to be more like(sqrt(X)+sqrt(Y))^2 or maybe even exp(log(X)+log(Y)) ratherthan just X+Y. Is that supported in some obvious way, or isthere some other way to phrase my query to say "I want bothterms but they should both be important if possible?"
Thanks,

Tim
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: product based term combination for BooleanQuery?

Reply via email to