Should we change the scoring behaviour of FuzzyQuery?

The current approach of turning Foo~ into a large
boolean query means that result scores are heavily
diluted for matches. 

In my tests a search for Foo returns documents
containing Foo with a score of 1.
A search for Foo~ returns documents containing Foo
with a score of just 0.01 (this was the top score).

I know Lucene scoring isn't guaranteed to consistently
return values in the range of 0 to 1 but I think we
should make some attempts to avoid scoring
insconsistencies like the one above.

To this end, I have tried changing FuzzyQuery to
internally use this class to ignore the coordination
factor in scores (the number of terms in query):

class FuzzyBooleanQuery extends BooleanQuery
{    
  public Similarity getSimilarity(Searcher searcher)
  {        
     return new DefaultSimilarity(){
       public float coord(int overlap, int maxOverlap)
       {
          return 1;
       }
     };
  }
}

This seems to produce more realistic scores and looks
to preserve the same sort order. 

Any views?

Cheers
Mark


        
        
                
___________________________________________________________ 
ALL-NEW Yahoo! Messenger - all new features - even more fun! 
http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to