Have you looked at the explains to see what is coming out of the FuzzyQuery? Also, are you using Hits to get that score? Scores get normalized to 1 by that process.

-Grant
On Apr 11, 2007, at 2:06 AM, Michael Barbarelli wrote:

Hello.

I am using Lucene to submit fuzzy queries against an index. I have noticed that relevant matches are often retreived, but the scoring is not at all
what I expected.

For example, if my query is "rightches~", a reference to a text file with
the single word "righteous" is returned with a score of 100 percent.
However, I think the actual score should be somewhere in the neighborhood of .66, not 1. Anyone follow me? Degree of similarity is what I want in this
case.

But Lucene score does not take into account how well a term matches a
FuzzyQuery. That just seems to be the way Lucene is built currently. The score is based on term frequency of the actual matching term. FuzzyQuery
gets rewritten as a BooleanQuery with all matching terms OR'd.

Degree of similarity is what I want in this case. When "rightches~" matches
"rightheous", I should get a similarity score of about .66.

What I want is to get at the raw difference that Lucene uses:  the
Levenstein distance algorithm.  I think I'll need to use the code in
FuzzyTermEnum.java (or .cs) as a starting point. I figure I can can probably
use that code directly somehow, or at least borrow the similarity
computation.

Frankly, though, I'm not sure I'm treading down the right path on this. Can
anyone help with specifics, past experience, or examples?

Cheers,
Mike

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to