Richard, I believe I found the problem with the parser, would you mind to take a look?
This PR should fix it: https://github.com/apache/opennlp/pull/199 Jörn On Mon, May 15, 2017 at 4:14 PM, Richard Eckart de Castilho <r...@apache.org> wrote: > Hi Rodrigo, > > On 15.05.2017, at 15:36, Rodrigo Agerri <rage...@apache.org> wrote: > > > > I cannot reproduce the lemmatizer issue. Could you please share your > > training data? > > I have observed the change in behavior via the OpenNlpLemmatizerTrainerTest > in DKPro Core [1]. It happens when I change the OpenNLP version in the POM > from 1.7.2 to 1.8.0 (after including the OpenNLP staging Maven repo of > course). > Unfortunately, it's not a simple minimal OpenNLP-only unit test, but it > makes used > of the respective DKPro Core UIMA components. > > The data that is used is the GUM 3.0.0 corpus, specifically the CoNLL > files in it [2]. > > The corpus can be downloaded from: https://github.com/amir- > zeldes/gum/archive/V3.0.0.zip > > Cheers, > > -- Richard > > [1] https://github.com/dkpro/dkpro-core/blob/ > 89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp- > asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/opennlp/ > OpenNlpLemmatizerTrainerTest.java > [2] https://github.com/dkpro/dkpro-core/blob/master/dkpro- > core-api-datasets-asl/src/main/resources/de/tudarmstadt/ > ukp/dkpro/core/api/datasets/lib/gum-en-conll-3.0.0.yaml