Real morphology (finding the root for all the forms of a word) in Russian might not be that easy since in Russian you have both prefixes (aspect) and suffixes (case, number, conjugation) that inflect a word. But, there are already efforts to write stemmers (suffix strippers) for Russian following Porter's model. SNOWBALL (for SNOBOL) is a formal language which has found it's main use in writing stemmers for different languages. Until now there are rule sets for Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Russian, Spanish and Swedish.
Sometimes ago, somebody posted an French stemmer built from SNOWBALL. It seems straightforward to convert all these stemmers to Lucene and maybe include them in the package. The site for SNOWBALL is snowball.sf.net. The latest version of their compiler outputs Java code. I am attaching the Russian SNOWBALL file and its corresponding Java output. This is just the stemmer though and does not include the needed code for interfacing with Lucene. Best, Alex -----Original Message----- From: Philipp Chudinov [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 07, 2002 1:21 AM To: Lucene Users List Subject: Re: Support for russian morphology in Lucene its mei :) having no ideas about morphology and great wishes to use lucene in russian. nice to see you here. maybe we should try to do things together. ----- Original Message ----- From: "Vadim Solonovich" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Cc: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, March 07, 2002 6:40 AM Subject: Support for russian morphology in Lucene > Hi All ! > > Is there anybody who have any ideas about implementing russian > morphology in Lucene ? > Please, let me know. > > Thanks in advance. > > Vadim Solonovich, > mailto:[EMAIL PROTECTED] > http://www.park.ru > http://garant.park.ru -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
russian.java
Description: Binary data
stem.sbl
Description: Binary data
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>