At 10:19 PM -0400 5/22/01, Geoff Hutchison wrote:
>On Tue, 22 May 2001, Gilles Detillieux wrote:
>
>>  > For example, the pages which were the top matches
>>  > with version 3.1.5 are way down on the list with 3.2.0b3,
>>  > even though the <TITLE> of those pages contain the
>>  > keywords being searched for!
>>
>>  Well, I'm not aware of changes in 3.2 that would alter relative weights
>>  of different types of words that drastically.
>
>I am. Basically I haven't changed the weighting much at all, though some
>drastic changes in the weighting formula really need to be done. (For
>example, right now, no version of ht://Dig weights words by word
>frequency--you'd like to essentially ignore common words and favor rare
>words.)

Actually, for web searching, that's not such a good idea.  Term 
Frequency and Inverse Document Frequency work best for nice fancy 
faceted searches such as those created by librarians and information 
brokers.  For short web searches, averaging 1 to 3 words, I find it's 
best to weight much more heavily on phrase matches, with additional 
weights for titles, keywords and meta descriptions.  Then if you can 
show context and hit highlighting, you get a pretty good result.

Avi
-- 
_________________________________________________
Complete Guide to Search Engines for Web Sites, Intranets, 
   and Portals: <http://www.searchtools.com>

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to