On 10/14/06, Jong Kim <[EMAIL PROTECTED]> wrote:
Hi,
I'm looking for a stemmer that is capable of returning all morphological
variants of a query term (to be used for high-recall search). For example,
given a query term of 'cares', I would like to be able to generate 'cares',
'care', 'cared', and 'caring'.
I looked at the Porter stemmer, Snowball stemmer, and the K-stem.
All of them provide a method that takes a surface string ('cares') as an
input and returns its base form/stem, which is 'care' in this example.
But it appears that I can not use the stemmer to generate all of the
inflected forms of a given query term.
Stemming is a multi-step, lossy, one-way operation and it does not
suprised me that none of these packages attempts the reverse
operation.
My suggestion is to create a reverse stemmer yourself by taking the
lexicon of your corpus, stemming all the terms, and inverting the map.
At query time, a lookup can be performed using a trie or hashtable.
best,
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]