On Jun 9, 2004, at 4:26 AM, gaudinat wrote:
What does exactly happen with three words or more when we do a proximity search?
such as: "lucene jakarta best"~10
Is each word can be at a distance of 10 of each others, or is there an other behaviour?

The total number of "hops" to put the words in order is calculated and if it is less than or equal to 10 there is a match.


For example, a field was indexed with "the quick brown fox jumps...".

"quick brown" matches
"quick fox"~1 matches (# of hops = 1)
"quick jumps"~1 does not match (# hops = 2)
"the brown jumps"~1 does not match (# hops = 2)
"fox brown"~1 does NOT match (# hops is actually 2)

By the way, I would like to know if someone use this lucene feature regularly?

I suspect many people do. You can also set a default slop factor on QueryParser so users don't have to use the ~10 syntax.


For my part I would like to use this feature to improve the precision of the finding documents by using the word position.

Also have a look at the new (in Lucene 1.4) SpanQuery family. It has a less confusing "slop factor" and you can control whether the terms must be in order or not (with SpanNearQuery). QueryParser does not support it currently, but subclassing and overriding getFieldQuery makes it possible.


        Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to