Hello Richard, I have tried with various corpora, including GUM, but I cannot reproduce that error.
https://github.com/apache/opennlp/commit/8a3b3b537a30b14c4ffb5eb32ffa41 d5027bddad Please note that commit O-904 changed (broke) the lemmatizer API substantially to make it uniform between DictionaryLemmatizer and the LemmatizerME (e.g., doing the decoding of lemmas internally and so on) so that this line for tagging with the LemmatizerME is not required: https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpLemmatizer.java#L135 Also, that commit changed the LemmaSampleStream and LemmaSample classes, so it is possible that is affecting this class: https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/internal/CasLemmaSampleStream.java I understand the logic of this class correctly as it stands it will take an already encoded SES and will try to encoded it again? Could you please take a look and see if that could be the problem? Cheers, Rodrigo On Mon, May 15, 2017 at 6:21 PM, Richard Eckart de Castilho <r...@apache.org> wrote: > > On 15.05.2017, at 16:35, Joern Kottmann <kottm...@gmail.com> wrote: > > > > Richard, I believe I found the problem with the parser, would you mind to > > take a look? > > > > This PR should fix it: > > https://github.com/apache/opennlp/pull/199 > > The parser test works nicely with the PR. > > The lemmatizer test still behaves strange. > > Cheers, > > -- Richard > >