Hi!

On 19.05.2008, at 22:46, Radu Spineanu wrote:

Hi,

Can ferret search for a combination of words and return the distance between them in a text?

It won't directly return you the distance but given the fact that Ferret stores term positions it should be possible to manually determine the distance between different terms. You may also issue phrase queries that only return hits for terms that are separated by at most n other terms. The QueryParser API docs or the Ferret book have examples of this.

If it exists is there a way you can improve on this by looking if they are separated by a certain character(like . for different sentences)?

Usually you dont index characters like '.' at all (they are removed during analysis, when the text is split up into tokens), but if you changed that so sentence endings end up in the index as kind of special terms this might be possible, too.

I dont know your use case, but keep in mind that you can get the effect of ranking terms that are closer together higher by chaining Phrase Queries with different Slop values, and assigning them different boosts:

("red fox")^15 OR ("red fox"~4)^10 OR ("red fox"~10)^5 OR ("red fox"~100)

this will boost the exact match the most, and assign lower boosts to matches where the terms have larger distance. Maybe something like this will already be a 'good enough' solution to your problem?

cheers,
Jens

--
Jens Krämer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/     - The new free film database

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to