Thank you Erick! Will give it a shot! On 4/11/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Go for a HitCollector. In particular, TopDocs will give you the raw scores. Erick On 4/11/07, Michael Barbarelli <[EMAIL PROTECTED]> wrote: > > Hi Grant. > > Yes, I'm getting the score from the Hits collection. And yes, they get > normalized to 1; which is what I don't want. > > Or, I can leave the Hits objects as is, but I know Lucene also must > calculate a raw difference as part of the overall score calculation. > How can I get at that value? > > Thanks! > > Mike > > > On 4/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > > > Have you looked at the explains to see what is coming out of the > > FuzzyQuery? Also, are you using Hits to get that score? Scores get > > normalized to 1 by that process. > > > > -Grant > > On Apr 11, 2007, at 2:06 AM, Michael Barbarelli wrote: > > > > > Hello. > > > > > > I am using Lucene to submit fuzzy queries against an index. I have > > > noticed > > > that relevant matches are often retreived, but the scoring is not > > > at all > > > what I expected. > > > > > > For example, if my query is "rightches~", a reference to a text > > > file with > > > the single word "righteous" is returned with a score of 100 percent. > > > However, I think the actual score should be somewhere in the > > > neighborhood of > > > .66, not 1. Anyone follow me? Degree of similarity is what I want > > > in this > > > case. > > > > > > But Lucene score does not take into account how well a term matches a > > > FuzzyQuery. That just seems to be the way Lucene is built > > > currently. The > > > score is based on term frequency of the actual matching term. > > > FuzzyQuery > > > gets rewritten as a BooleanQuery with all matching terms OR'd. > > > > > > Degree of similarity is what I want in this case. When > > > "rightches~" matches > > > "rightheous", I should get a similarity score of about .66. > > > > > > What I want is to get at the raw difference that Lucene uses: the > > > Levenstein distance algorithm. I think I'll need to use the code in > > > FuzzyTermEnum.java (or .cs) as a starting point. I figure I can can > > > probably > > > use that code directly somehow, or at least borrow the similarity > > > computation. > > > > > > Frankly, though, I'm not sure I'm treading down the right path on > > > this. Can > > > anyone help with specifics, past experience, or examples? > > > > > > Cheers, > > > Mike > > > > -------------------------- > > Grant Ingersoll > > Center for Natural Language Processing > > http://www.cnlp.org > > > > Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ > > LuceneFAQ > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > >