Re: Passage Search

Christoph Goller Fri, 28 Jan 2005 01:04:08 -0800

Joaquin Delgado schrieb:

What is described here as "Passage Search" is nothing more than a
PhraseQuery with a large slope. I think it's a UI problem rather than a
ranking algorithm. For example you may want to have translate simple
multi-term queries into phrasequery by default (instead of AND or OR).
Let's say search you convert the query into <"man bites dog"~1000> every
time someone types just <man bites dog> so to give more weight to
documents/passages that contain the words near to each other, like
Google does by does.

A PhraseQuery with large slop should indeed provide the desired behavior and Similarity provides means for specifying sloppyness. There are however some caveats. As Paul already said, Lucene scores documents not passages. A document with lots of "dog bytes man" might get a higher score than one with only one "man bytes dog". Furthermore sloppyness of a phrase match is one value into which possible gaps (intervening text) as well as reversed order of phrase components is compiled. These two types of deviation from the specified phrase can not easily be distinguished and therefore they cannot be peanalized separately. Last but not least, PhraseQuery requires all components of the phrase to match. If one wants exactly the behavior of the described passage search one would have to modify/subclass PhraseQuery. I think this would not be too difficult.

regards,
Christoph

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Passage Search

Reply via email to