Another thought on fuzzy scoring:
shouldn't all these queries which automatically expand
terms favour common words over rare ones? The default
scoring behaviour at the moment favours rare words. As
a user aren't I more likely to be looking for the most
common expansions? 

If I'm not sure how to spell I might search for:
accomodation~
or
accom*
The fuzzy scoring algorithms will currently favour all
of the mis-spellings of accommodation in the ranking
of results because they are more rare.

Ideally within the expansions of a term the score
contribution should be based on df (as opposed to the
usual idf) BUT within the overall query the usual idf
scheme applies. To clarify:
If I search for:
  the cheapest accomodation~ in london
I want to see the most common spellings of
accommodation before all other variants of this word
BUT I then want these variants scored against the
OTHER words ("in", "the" etc) on the usual basis of
rarity.

This suggests a sort order within another, different
sort order.
This seems like it would not be easy to do. Any bright
ideas?

Cheers
Mark


        
        
                
___________________________________________________________ 
ALL-NEW Yahoo! Messenger - all new features - even more fun! 
http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to