This is very cool! Lemmatization is an important tool for making search
work better.
Would you consider changing the licensing to the Apache 2.0 license?
On 10/23/2013 08:17 AM, Michal Hlavac wrote:
Hi,
I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java.
Originally it's written in C#.
Lemmagen project uses rules to lemmatize word. Algorithm is described here:
http://lemmatise.ijs.si/Download/File/Documentation%23JournalPaper.pdf
Project is writtten under GPLv3. Sources are located on bitbucket server:
https://bitbucket.org/hlavki/jlemmagen
There is also Lemmagen4j project which use more memory and without prebuilded
trees.
I obtained also licenced dictionaries to build rules tree for 15 languages.
Dictionaries are licenced, but prebuilded trees don't.
But you can also build your own dictionary.
Project contains also TokenFilter for lucene/solr.
Project is not stable, but any feedback is appreciated.
Supported languages are:
mlteast-bg - Bulgarian
mlteast-cs - Czech
mlteast-en - English
mlteast-et - Estonian
mlteast-fr - French
mlteast-hu - Hungarian
mlteast-mk - Macedonia
mlteast-pl - Polish
mlteast-ro - Romanian
mlteast-ru - Russian
mlteast-sk - Slovak
mlteast-sl - Slovene
mlteast-sr - Serbian
mlteast-uk - Ukrainian
thanks, miso
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org