Richard, I believe I found the problem with the parser, would you mind to
take a look?

This PR should fix it:
https://github.com/apache/opennlp/pull/199

Jörn

On Mon, May 15, 2017 at 4:14 PM, Richard Eckart de Castilho <r...@apache.org>
wrote:

> Hi Rodrigo,
>
> On 15.05.2017, at 15:36, Rodrigo Agerri <rage...@apache.org> wrote:
> >
> > I cannot reproduce the lemmatizer issue. Could you please share your
> > training data?
>
> I have observed the change in behavior via the OpenNlpLemmatizerTrainerTest
> in DKPro Core [1]. It happens when I change the OpenNLP version in the POM
> from 1.7.2 to 1.8.0 (after including the OpenNLP staging Maven repo of
> course).
> Unfortunately, it's not a simple minimal OpenNLP-only unit test, but it
> makes used
> of the respective DKPro Core UIMA components.
>
> The data that is used is the GUM 3.0.0 corpus, specifically the CoNLL
> files in it [2].
>
> The corpus can be downloaded from: https://github.com/amir-
> zeldes/gum/archive/V3.0.0.zip
>
> Cheers,
>
> -- Richard
>
> [1] https://github.com/dkpro/dkpro-core/blob/
> 89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-
> asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/opennlp/
> OpenNlpLemmatizerTrainerTest.java
> [2] https://github.com/dkpro/dkpro-core/blob/master/dkpro-
> core-api-datasets-asl/src/main/resources/de/tudarmstadt/
> ukp/dkpro/core/api/datasets/lib/gum-en-conll-3.0.0.yaml

Reply via email to