Chris Sibert wrote:
I'm not sure what all of the 'advanced features' were also.

Phonetic Searching - probably not important to this application.


Phonetic searching may be achieved by writing your own Analyzer, which instead (or more probably, along with) the plain tokens provides their phonetic codes, e.g. Double Metaphone for English, or the less useful but more familiar Soundex. Phonetic searching increases recall but lowers precision, especially if you use stemmer before phonetic encoding...


One trick to consider if using phonetic encoding is to keep around the histogram of the original words that have been mapped to corresponding phonetic codes. Then, if a query fails to provide satisfactory results, you can provide a useful suggestion based on the most frequent term found in the histogram, with equal phonetic code to the term in the query.

Synonym searching might be desirable, but now that I'm thinking about it,
also likely not important.

Associated Words - sounds very interesting, like 'gold' might return 'metal'
also, etc.

If you plan on using just English text, you may want to check the excellent (and free) WordNet database, which offers also API - both for query expansion and for finding "associated words" (synsets?), or hypernyms like in your example.


--
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to