This is very cool! Lemmatization is an important tool for making search work better.

Would you consider changing the licensing to the Apache 2.0 license?

On 10/23/2013 08:17 AM, Michal Hlavac wrote:
Hi,

I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. 
Originally it's written in C#.
Lemmagen project uses rules to lemmatize word. Algorithm is described here:
http://lemmatise.ijs.si/Download/File/Documentation%23JournalPaper.pdf

Project is writtten under GPLv3. Sources are located on bitbucket server:
https://bitbucket.org/hlavki/jlemmagen

There is also Lemmagen4j project which use more memory and without prebuilded 
trees.

I obtained also licenced dictionaries to build rules tree for 15 languages. 
Dictionaries are licenced, but prebuilded trees don't.
But you can also build your own dictionary.

Project contains also TokenFilter for lucene/solr.
Project is not stable, but any feedback is appreciated.

Supported languages are:
mlteast-bg - Bulgarian
mlteast-cs - Czech
mlteast-en - English
mlteast-et - Estonian
mlteast-fr - French
mlteast-hu - Hungarian
mlteast-mk - Macedonia
mlteast-pl - Polish
mlteast-ro - Romanian
mlteast-ru - Russian
mlteast-sk - Slovak
mlteast-sl - Slovene
mlteast-sr - Serbian
mlteast-uk - Ukrainian

thanks, miso


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to