Wojtek H a écrit :
Hi all,

Snowball stemmers are part of Lucene, but for few languages only. We
have documents in various languages and so need stemmers for many
languages (in particular polish). One of the ideas is to use ispell
dictionaries. There are ispell dicts for many languages and so this
solution is good for multilingual environment. Maybe this is not
perfect place to ask, but does anyone know about java stemmer using
ispell dicts?
There is aspell-like java spell-checker (Jazzy) but I could not see
how to use it for stemming. We are considering porting part of
postgres tsearch module to java, because tsearch uses ispell dicts for
stemming.
But maybe there is a better way or there are people working on
something like that?
ispell data is nice for phonetic, and for enumerate a huge list of words. The ispell dictionnary is one way : pseudo root => word, it looks hard to build the inverse function, lemme is splitted in multiple affix. But it can be used to find rules, just like http://www.getopt.org/stempel/ do.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to