Hi, I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. Originally it's written in C#. Lemmagen project uses rules to lemmatize word. Algorithm is described here: http://lemmatise.ijs.si/Download/File/Documentation%23JournalPaper.pdf
Project is writtten under GPLv3. Sources are located on bitbucket server: https://bitbucket.org/hlavki/jlemmagen There is also Lemmagen4j project which use more memory and without prebuilded trees. I obtained also licenced dictionaries to build rules tree for 15 languages. Dictionaries are licenced, but prebuilded trees don't. But you can also build your own dictionary. Project contains also TokenFilter for lucene/solr. Project is not stable, but any feedback is appreciated. Supported languages are: mlteast-bg - Bulgarian mlteast-cs - Czech mlteast-en - English mlteast-et - Estonian mlteast-fr - French mlteast-hu - Hungarian mlteast-mk - Macedonia mlteast-pl - Polish mlteast-ro - Romanian mlteast-ru - Russian mlteast-sk - Slovak mlteast-sl - Slovene mlteast-sr - Serbian mlteast-uk - Ukrainian thanks, miso --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org