Real morphology (finding the root for all the forms of a word) in
Russian might not be that easy since in Russian you have both prefixes
(aspect) and suffixes (case, number, conjugation) that inflect a word.
But, there are already efforts to write stemmers (suffix strippers) for
Russian following Porter's model. SNOWBALL (for SNOBOL) is a formal
language which has found it's main use in writing stemmers for different
languages. Until now there are rule sets for Danish, Dutch, English,
French, German, Italian, Norwegian, Portuguese, Russian, Spanish and
Swedish.

Sometimes ago, somebody posted an French stemmer built from SNOWBALL. It
seems straightforward to convert all these stemmers to Lucene and maybe
include them in the package.

The site for SNOWBALL is snowball.sf.net. The latest version of their
compiler outputs Java code. I am attaching the Russian SNOWBALL file and
its corresponding Java output. This is just the stemmer though and does
not include the needed code for interfacing with Lucene.

Best,

Alex

-----Original Message-----
From: Philipp Chudinov [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, March 07, 2002 1:21 AM
To: Lucene Users List
Subject: Re: Support for russian morphology in Lucene


its mei :) having no ideas about morphology and great wishes to use
lucene in russian. nice to see you here. maybe we should try to do
things together.

----- Original Message -----
From: "Vadim Solonovich" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Cc: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, March 07, 2002 6:40 AM
Subject: Support for russian morphology in Lucene


> Hi All !
>
> Is there anybody who have any ideas about implementing russian 
> morphology
in Lucene ?
> Please, let me know.
>
> Thanks in advance.
>
> Vadim Solonovich,
>   mailto:[EMAIL PROTECTED]
>   http://www.park.ru
>   http://garant.park.ru


--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

Attachment: russian.java
Description: Binary data

Attachment: stem.sbl
Description: Binary data

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to