Joaquin Delgado schrieb:
What is described here as "Passage Search" is nothing more than a
PhraseQuery with a large slope. I think it's a UI problem rather than a
ranking algorithm. For example you may want to have translate simple
multi-term queries into phrasequery by default (instead of AND or OR).
Let's say search you convert the query into <"man bites dog"~1000> every
time someone types just <man bites dog> so to give more weight to
documents/passages that contain the words near to each other, like
Google does by does.

A PhraseQuery with large slop should indeed provide the desired behavior
and Similarity provides means for specifying sloppyness. There are however
some caveats. As Paul already said, Lucene scores documents not passages.
A document with lots of "dog bytes man" might get a higher score than one
with only one "man bytes dog". Furthermore sloppyness of a phrase match
is one value into which possible gaps (intervening text) as well as
reversed order of phrase components is compiled. These two types of
deviation from the specified phrase can not easily be distinguished
and therefore they cannot be peanalized separately. Last but not least,
PhraseQuery requires all components of the phrase to match. If one wants
exactly the behavior of the described passage search one would have to modify/subclass PhraseQuery. I think this would not be too difficult.


regards,
Christoph

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to