Re: How to access Levenstein distance number?

Grant Ingersoll Wed, 11 Apr 2007 03:44:29 -0700

Have you looked at the explains to see what is coming out of theFuzzyQuery? Also, are you using Hits to get that score? Scores getnormalized to 1 by that process.


-Grant
On Apr 11, 2007, at 2:06 AM, Michael Barbarelli wrote:

Hello.
I am using Lucene to submit fuzzy queries against an index. I havenoticedthat relevant matches are often retreived, but the scoring is notat all
what I expected.
For example, if my query is "rightches~", a reference to a textfile with
the single word "righteous" is returned with a score of 100 percent.
However, I think the actual score should be somewhere in theneighborhood of.66, not 1. Anyone follow me? Degree of similarity is what I wantin this
case.

But Lucene score does not take into account how well a term matches a
FuzzyQuery. That just seems to be the way Lucene is builtcurrently. Thescore is based on term frequency of the actual matching term.FuzzyQuery
gets rewritten as a BooleanQuery with all matching terms OR'd.
Degree of similarity is what I want in this case. When"rightches~" matches
"rightheous", I should get a similarity score of about .66.

What I want is to get at the raw difference that Lucene uses:  the
Levenstein distance algorithm.  I think I'll need to use the code in
FuzzyTermEnum.java (or .cs) as a starting point. I figure I can canprobably
use that code directly somehow, or at least borrow the similarity
computation.
Frankly, though, I'm not sure I'm treading down the right path onthis. Can
anyone help with specifics, past experience, or examples?

Cheers,
Mike


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to access Levenstein distance number?

Reply via email to