Not only licensing, but also I think we try to keep OpenNLP without
external dependencies. The Morfologik also has some dependencies itself.


2016-07-15 4:55 GMT-03:00 Rodrigo Agerri <rage...@apache.org>:

> Great stuff, William.
>
> I have been using Morfologik stemming for a long time and when we
> included it we put it as an addon. I assume that the reason was its
> license, but reading Morfologik license it is not clear to me why is
> is not Apache compatible.
>
> If it is, it would be nice to include it directly in OpenNLP.
>
> Can anyone shed any light on this?
>
> Thanks,
>
> R
>
> On Fri, Jul 15, 2016 at 12:02 AM, William Colen <william.co...@gmail.com>
> wrote:
> > Hello,
> >
> > A while back we started working on a Morfologik Addon.
> >
> > http://svn.apache.org/viewvc/opennlp/addons/
> >
> > I checked it out last week and notice it was outdated, specially because
> it
> > was not using the latest Morfologik version. Also it was missing
> > documentation.
> >
> > You can find more about Morfologik here:
> > https://github.com/morfologik/morfologik-stemming
> >
> > Morfologik provides tools for finite state automata (FSA) construction
> and
> > dictionary-based morphological dictionaries.
> >
> > The Morfologik Addon implements some OpenNLP interfaces and extends some
> > classes to make it easier to use of FSA Morfologik dictionaries:
> >
> >    - opennlp.morfologik.tagdict.MorfologikPOSTaggerFactory
> >       - Extends: opennlp.tools.postag.POSTaggerFactory
> >       - Helps creating a POSTagger model with an embedded TagDictionary
> >       based on FSA
> >    - opennlp.morfologik.tagdict.MorfologikTagDictionary
> >    - Implements: opennlp.tools.postag.TagDictionary
> >       - A TagDictionary based on FSA is much smaller than the defaul XML
> >       based, and consumes less memory.
> >    - opennlp.morfologik.lemmatizer.MorfologikLemmatizer
> >    - Implements: opennlp.tools.lemmatizer.DictionaryLemmatizer
> >       - A dictionary based lemmatizer that uses FSA dictionary.
> >
> > It also provides a command line interface that allows:
> >
> >    - MorfologikDictionaryBuilder
> >       - builds a binary POS Dictionary using Morfologik
> >    - XMLDictionaryToTable
> >       - reads an OpenNLP XML tag dictionary and outputs it in a tab
> >       separated file that can be built into a FSA dictionary
> >
> >
> > In a project I developed it was of great help. The TAG Dictionary for POS
> > Tag was huge (something like 50 MB), requiring a lot of memory.
> > Migrating it to a FSA dictionary allowed not only a smaller model, but
> also
> > I could use the model without the need to increase the JVM memory.
> >
> > More here:
> >
> https://cwiki.apache.org/confluence/display/OPENNLP/FSA+Dictionary+with+morfologik-addon
> >
> > Hope it will be helpful.
> >
> > William
>

Reply via email to