Not only licensing, but also I think we try to keep OpenNLP without external dependencies. The Morfologik also has some dependencies itself.
2016-07-15 4:55 GMT-03:00 Rodrigo Agerri <rage...@apache.org>: > Great stuff, William. > > I have been using Morfologik stemming for a long time and when we > included it we put it as an addon. I assume that the reason was its > license, but reading Morfologik license it is not clear to me why is > is not Apache compatible. > > If it is, it would be nice to include it directly in OpenNLP. > > Can anyone shed any light on this? > > Thanks, > > R > > On Fri, Jul 15, 2016 at 12:02 AM, William Colen <william.co...@gmail.com> > wrote: > > Hello, > > > > A while back we started working on a Morfologik Addon. > > > > http://svn.apache.org/viewvc/opennlp/addons/ > > > > I checked it out last week and notice it was outdated, specially because > it > > was not using the latest Morfologik version. Also it was missing > > documentation. > > > > You can find more about Morfologik here: > > https://github.com/morfologik/morfologik-stemming > > > > Morfologik provides tools for finite state automata (FSA) construction > and > > dictionary-based morphological dictionaries. > > > > The Morfologik Addon implements some OpenNLP interfaces and extends some > > classes to make it easier to use of FSA Morfologik dictionaries: > > > > - opennlp.morfologik.tagdict.MorfologikPOSTaggerFactory > > - Extends: opennlp.tools.postag.POSTaggerFactory > > - Helps creating a POSTagger model with an embedded TagDictionary > > based on FSA > > - opennlp.morfologik.tagdict.MorfologikTagDictionary > > - Implements: opennlp.tools.postag.TagDictionary > > - A TagDictionary based on FSA is much smaller than the defaul XML > > based, and consumes less memory. > > - opennlp.morfologik.lemmatizer.MorfologikLemmatizer > > - Implements: opennlp.tools.lemmatizer.DictionaryLemmatizer > > - A dictionary based lemmatizer that uses FSA dictionary. > > > > It also provides a command line interface that allows: > > > > - MorfologikDictionaryBuilder > > - builds a binary POS Dictionary using Morfologik > > - XMLDictionaryToTable > > - reads an OpenNLP XML tag dictionary and outputs it in a tab > > separated file that can be built into a FSA dictionary > > > > > > In a project I developed it was of great help. The TAG Dictionary for POS > > Tag was huge (something like 50 MB), requiring a lot of memory. > > Migrating it to a FSA dictionary allowed not only a smaller model, but > also > > I could use the model without the need to increase the JVM memory. > > > > More here: > > > https://cwiki.apache.org/confluence/display/OPENNLP/FSA+Dictionary+with+morfologik-addon > > > > Hope it will be helpful. > > > > William >