Doug Cutting wrote:

David Spencer wrote:

But what is right if there are > 2 terms in terms of the phrases - does it have a phrase for every pair of terms like this (ignore fields and boosts and proximity for a sec):

search for "t1 t2 t3" gives you these phrases in addition to the direct field matches:

"t1 t2"
"t2 t3"
"t1 t3"


What the sloppy phrase scorer does when slop=infinity is find the smallest windows containing all three terms and scores things based on the width of these windows, by summing Similarity.sloppyFreq(). That's what I was figuring we'd start with. We could alternately construct all pairwise queries, but that could get expensive.

This use of slop may fail to reward a match enough when two of the terms occur frequently phrasally and the third only appears rarely in the text. Perhaps we should implement a new DensityPhraseQuery that does not require all terms but rewards for more small gaps between distinct query terms. Similarity.sloppyFreq() could be called for all gaps and summed. So if two of three terms occurred five times as a phrase, but the third term didn't occur at all in the field, the freq would be 5.0 (since there would be five gaps of size zero). But if all three terms occurred as a phrase five times then the score would be 10.0, since there would be ten gaps of size zero. Does this make sense? It would not be hard to implement.

Do folks agree that this is a good general formulation? If so, would someone like to contribute a version of MultiFieldQueryParser that implements this? The API should probably be something like:



I might already have this done, just confirm the above question re > 2 terms.


Did I confirm or deny?

Confirmed! Let me tweak my code and I'll post it for examination.



Doug


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to