Hi Richard,

I cloned DKPro code and tried Rodrigo proposed changes. Your test passes
with it.

Thank you
William

2017-05-15 18:51 GMT-03:00 Rodrigo Agerri <rage...@apache.org>:

> Hello Richard,
>
> I have tried with various corpora, including GUM, but I cannot reproduce
> that error.
>
> https://github.com/apache/opennlp/commit/8a3b3b537a30b14c4ffb5eb32ffa41
> d5027bddad
>
> Please note that commit O-904 changed (broke) the lemmatizer API
> substantially to make it uniform between DictionaryLemmatizer and the
> LemmatizerME (e.g., doing the decoding of lemmas internally and so on) so
> that this line for tagging with the LemmatizerME is not required:
>
> https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499d
> ff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/
> tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpLemmatizer.java#L135
>
> Also, that commit changed the LemmaSampleStream and LemmaSample classes, so
> it is possible that is affecting this class:
>
> https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499d
> ff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/
> tudarmstadt/ukp/dkpro/core/opennlp/internal/CasLemmaSampleStream.java
>
> I understand the logic of this class correctly as it stands it will take an
> already encoded SES and will try to encoded it again?
>
> Could you please take a look and see if that could be the problem?
>
> Cheers,
>
> Rodrigo
>
> On Mon, May 15, 2017 at 6:21 PM, Richard Eckart de Castilho <
> r...@apache.org>
> wrote:
>
> > > On 15.05.2017, at 16:35, Joern Kottmann <kottm...@gmail.com> wrote:
> > >
> > > Richard, I believe I found the problem with the parser, would you mind
> to
> > > take a look?
> > >
> > > This PR should fix it:
> > > https://github.com/apache/opennlp/pull/199
> >
> > The parser test works nicely with the PR.
> >
> > The lemmatizer test still behaves strange.
> >
> > Cheers,
> >
> > -- Richard
> >
> >
>

Reply via email to