At 4:35 PM -0600 2/23/00, Gilles Detillieux wrote:
>Consider the soundex and metaphone analogy I brought up earlier.
>Any "sound" may have many possible letters or letter combinations to
>produce them.  When applied to long words, you'd have even more possible
>words than for your "�ph�m�re" example above.  But soundex and metaphone
>don't generate ALL possible words.  They look at all the words that have
>been indexed, and record all the canonical forms of these words only,
>so that when you look up a given word, it will also search for other
>words that it knows are in the index that have a similar sound.

Yeah, I think you're right that an on-the-fly fuzzy isn't going to be 
very fast. Of course the problem with something based on the soundex 
or metaphone algorithms is that you have to be sure to run htfuzzy 
periodically, but the lookups would be pretty fast.

But to echo what Gilles said, you really don't want to be messing 
around in WordList or parser, especially if you don't know what 
you're doing. I think the Fuzzy class is pretty self-explanatory and 
almost anyone could write a fuzzy class. The key for the Soundex and 
Metaphone variety is the generateKey() method. The key for the 
Speling and Substring variety is the getWords() method.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to