Anil, I suppose it depends on how complex the language is and what is acceptable for your program. I have written a couple of stemmers that are fairly straightforward based on papers that I have read and work well for the langs. we are using. Your best bet is probably to do a literature search for the languages you are interested in and go from there.
I am, of course, assumming stemmers for your languages don't already exist. If your languages are common, there probably is a stemmer available in some form that you can use or adapt. You'd be suprised at what you get by doing a simple google search for "<lang X> stemmer" where lang X is the language you are interested in and no quotes. Hooking them into Lucene is straightforward and there are several examples of this available in the docs and code. -Grant >>> [EMAIL PROTECTED] 06/03/04 04:09PM >>> Hi, Can anyone provide some help on writing a stemmer for non-english languages? How proficient must I be in a language for which I wish to write the stemmer? Regards, Anil --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]