Re: How to access Levenstein distance number?

Michael Barbarelli Wed, 11 Apr 2007 08:16:32 -0700

Thank you Erick!  Will give it a shot!

On 4/11/07, Erick Erickson <[EMAIL PROTECTED]> wrote:


Go for a HitCollector. In particular, TopDocs will give you the raw
scores.

Erick

On 4/11/07, Michael Barbarelli <[EMAIL PROTECTED]> wrote:
>
> Hi Grant.
>
> Yes, I'm getting the score from the Hits collection.  And yes, they get
> normalized to 1; which is what I don't want.
>
> Or, I can leave the Hits objects as is, but I know Lucene also must
> calculate a raw difference as part of the overall score calculation.
> How can I get at that value?
>
> Thanks!
>
> Mike
>
>
> On 4/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> >
> > Have you looked at the explains to see what is coming out of the
> > FuzzyQuery?  Also, are you using Hits to get that score?  Scores get
> > normalized to 1 by that process.
> >
> > -Grant
> > On Apr 11, 2007, at 2:06 AM, Michael Barbarelli wrote:
> >
> > > Hello.
> > >
> > > I am using Lucene to submit fuzzy queries against an index. I have
> > > noticed
> > > that relevant matches are often retreived, but the scoring is not
> > > at all
> > > what I expected.
> > >
> > > For example, if my query is "rightches~", a reference to a text
> > > file with
> > > the single word "righteous" is returned with a score of 100 percent.
> > > However, I think the actual score should be somewhere in the
> > > neighborhood of
> > > .66, not 1. Anyone follow me?  Degree of similarity is what I want
> > > in this
> > > case.
> > >
> > > But Lucene score does not take into account how well a term matches
a
> > > FuzzyQuery. That just seems to be the way Lucene is built
> > > currently. The
> > > score is based on term frequency of the actual matching term.
> > > FuzzyQuery
> > > gets rewritten as a BooleanQuery with all matching terms OR'd.
> > >
> > > Degree of similarity is what I want in this case.  When
> > > "rightches~" matches
> > > "rightheous", I should get a similarity score of about .66.
> > >
> > > What I want is to get at the raw difference that Lucene uses:  the
> > > Levenstein distance algorithm.  I think I'll need to use the code in
> > > FuzzyTermEnum.java (or .cs) as a starting point. I figure I can can
> > > probably
> > > use that code directly somehow, or at least borrow the similarity
> > > computation.
> > >
> > > Frankly, though, I'm not sure I'm treading down the right path on
> > > this.  Can
> > > anyone help with specifics, past experience, or examples?
> > >
> > > Cheers,
> > > Mike
> >
> > --------------------------
> > Grant Ingersoll
> > Center for Natural Language Processing
> > http://www.cnlp.org
> >
> > Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
> > LuceneFAQ
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>

Re: How to access Levenstein distance number?

Reply via email to