Hello Richard,

I have tried with various corpora, including GUM, but I cannot reproduce
that error.

https://github.com/apache/opennlp/commit/8a3b3b537a30b14c4ffb5eb32ffa41
d5027bddad

Please note that commit O-904 changed (broke) the lemmatizer API
substantially to make it uniform between DictionaryLemmatizer and the
LemmatizerME (e.g., doing the decoding of lemmas internally and so on) so
that this line for tagging with the LemmatizerME is not required:

https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpLemmatizer.java#L135

Also, that commit changed the LemmaSampleStream and LemmaSample classes, so
it is possible that is affecting this class:

https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/internal/CasLemmaSampleStream.java

I understand the logic of this class correctly as it stands it will take an
already encoded SES and will try to encoded it again?

Could you please take a look and see if that could be the problem?

Cheers,

Rodrigo

On Mon, May 15, 2017 at 6:21 PM, Richard Eckart de Castilho <r...@apache.org>
wrote:

> > On 15.05.2017, at 16:35, Joern Kottmann <kottm...@gmail.com> wrote:
> >
> > Richard, I believe I found the problem with the parser, would you mind to
> > take a look?
> >
> > This PR should fix it:
> > https://github.com/apache/opennlp/pull/199
>
> The parser test works nicely with the PR.
>
> The lemmatizer test still behaves strange.
>
> Cheers,
>
> -- Richard
>
>

Reply via email to