Hi Michal, Pretty cool. Your work reminds me of what Leo Galambos did a while back:
http://link.springer.com/chapter/10.1007/978-3-540-39985-8_22 I believe his implementation is still available in the Egothor search engine project. Dawid On Wed, Oct 23, 2013 at 5:17 PM, Michal Hlavac <hla...@hlavki.eu> wrote: > Hi, > > I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. > Originally it's written in C#. > Lemmagen project uses rules to lemmatize word. Algorithm is described here: > http://lemmatise.ijs.si/Download/File/Documentation%23JournalPaper.pdf > > Project is writtten under GPLv3. Sources are located on bitbucket server: > https://bitbucket.org/hlavki/jlemmagen > > There is also Lemmagen4j project which use more memory and without prebuilded > trees. > > I obtained also licenced dictionaries to build rules tree for 15 languages. > Dictionaries are licenced, but prebuilded trees don't. > But you can also build your own dictionary. > > Project contains also TokenFilter for lucene/solr. > Project is not stable, but any feedback is appreciated. > > Supported languages are: > mlteast-bg - Bulgarian > mlteast-cs - Czech > mlteast-en - English > mlteast-et - Estonian > mlteast-fr - French > mlteast-hu - Hungarian > mlteast-mk - Macedonia > mlteast-pl - Polish > mlteast-ro - Romanian > mlteast-ru - Russian > mlteast-sk - Slovak > mlteast-sl - Slovene > mlteast-sr - Serbian > mlteast-uk - Ukrainian > > thanks, miso > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org