Hi Michal,

Pretty cool. Your work reminds me of what Leo Galambos did a while back:

http://link.springer.com/chapter/10.1007/978-3-540-39985-8_22

I believe his implementation is still available in the Egothor search
engine project.

Dawid



On Wed, Oct 23, 2013 at 5:17 PM, Michal Hlavac <hla...@hlavki.eu> wrote:
> Hi,
>
> I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. 
> Originally it's written in C#.
> Lemmagen project uses rules to lemmatize word. Algorithm is described here:
> http://lemmatise.ijs.si/Download/File/Documentation%23JournalPaper.pdf
>
> Project is writtten under GPLv3. Sources are located on bitbucket server:
> https://bitbucket.org/hlavki/jlemmagen
>
> There is also Lemmagen4j project which use more memory and without prebuilded 
> trees.
>
> I obtained also licenced dictionaries to build rules tree for 15 languages. 
> Dictionaries are licenced, but prebuilded trees don't.
> But you can also build your own dictionary.
>
> Project contains also TokenFilter for lucene/solr.
> Project is not stable, but any feedback is appreciated.
>
> Supported languages are:
> mlteast-bg - Bulgarian
> mlteast-cs - Czech
> mlteast-en - English
> mlteast-et - Estonian
> mlteast-fr - French
> mlteast-hu - Hungarian
> mlteast-mk - Macedonia
> mlteast-pl - Polish
> mlteast-ro - Romanian
> mlteast-ru - Russian
> mlteast-sk - Slovak
> mlteast-sl - Slovene
> mlteast-sr - Serbian
> mlteast-uk - Ukrainian
>
> thanks, miso
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to