On 10/14/06, Jong Kim <[EMAIL PROTECTED]> wrote:
Hi,I'm looking for a stemmer that is capable of returning all morphological variants of a query term (to be used for high-recall search). For example, given a query term of 'cares', I would like to be able to generate 'cares', 'care', 'cared', and 'caring'. I looked at the Porter stemmer, Snowball stemmer, and the K-stem. All of them provide a method that takes a surface string ('cares') as an input and returns its base form/stem, which is 'care' in this example. But it appears that I can not use the stemmer to generate all of the inflected forms of a given query term.
Stemming is a multi-step, lossy, one-way operation and it does not suprised me that none of these packages attempts the reverse operation. My suggestion is to create a reverse stemmer yourself by taking the lexicon of your corpus, stemming all the terms, and inverting the map. At query time, a lookup can be performed using a trie or hashtable. best, -Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
