Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Rodrigo Agerri
Hello Richard,

I have tried with various corpora, including GUM, but I cannot reproduce
that error.

https://github.com/apache/opennlp/commit/8a3b3b537a30b14c4ffb5eb32ffa41
d5027bddad

Please note that commit O-904 changed (broke) the lemmatizer API
substantially to make it uniform between DictionaryLemmatizer and the
LemmatizerME (e.g., doing the decoding of lemmas internally and so on) so
that this line for tagging with the LemmatizerME is not required:

https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpLemmatizer.java#L135

Also, that commit changed the LemmaSampleStream and LemmaSample classes, so
it is possible that is affecting this class:

https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/internal/CasLemmaSampleStream.java

I understand the logic of this class correctly as it stands it will take an
already encoded SES and will try to encoded it again?

Could you please take a look and see if that could be the problem?

Cheers,

Rodrigo

On Mon, May 15, 2017 at 6:21 PM, Richard Eckart de Castilho 
wrote:

> > On 15.05.2017, at 16:35, Joern Kottmann  wrote:
> >
> > Richard, I believe I found the problem with the parser, would you mind to
> > take a look?
> >
> > This PR should fix it:
> > https://github.com/apache/opennlp/pull/199
>
> The parser test works nicely with the PR.
>
> The lemmatizer test still behaves strange.
>
> Cheers,
>
> -- Richard
>
>


[GitHub] opennlp pull request #201: Opennlp 1060

2017-05-15 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/201

Opennlp 1060

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1060

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #201


commit 54432f8bc660d0ce64acc6569609644ae9bb4848
Author: beylerian 
Date:   2017-05-13T17:37:01Z

OPENNLP-1058: Update README.md to cover more

closes #198

commit 8ae5107f77453f13959525edee8694a6fefac2e6
Author: Jörn Kottmann 
Date:   2017-05-15T19:54:40Z

OPENNLP-1060: Fix computation of hash for the parser




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
Good to hear, the parser eval test also had a bug (O-1060), we will fix
this now as well before the next RC,
this should prevent that this happens again.

And thanks again for finding this!

Now we need to find the problem with the lemmatizer before we can build the
next RC.

Jörn



On Mon, May 15, 2017 at 6:21 PM, Richard Eckart de Castilho 
wrote:

> > On 15.05.2017, at 16:35, Joern Kottmann  wrote:
> >
> > Richard, I believe I found the problem with the parser, would you mind to
> > take a look?
> >
> > This PR should fix it:
> > https://github.com/apache/opennlp/pull/199
>
> The parser test works nicely with the PR.
>
> The lemmatizer test still behaves strange.
>
> Cheers,
>
> -- Richard
>
>


[GitHub] opennlp pull request #200: OPENNLP-1057: Add all Eval tests to the Eval prof...

2017-05-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/opennlp/pull/200


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #198: OPENNLP-1058: updated README.md

2017-05-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/opennlp/pull/198


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #200: OPENNLP-1057: Add all Eval tests to the Eval prof...

2017-05-15 Thread thygesen
GitHub user thygesen opened a pull request:

https://github.com/apache/opennlp/pull/200

OPENNLP-1057: Add all Eval tests to the Eval profile

Enabled all Eval tests
---
Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [√ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [√ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [√ ] Has your PR been rebased against the latest commit within the 
target branch (typically master)?

- √[ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [√ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [n/a ] Have you written or updated unit tests to verify your changes?
- [ n/a] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [n/a ] If applicable, have you updated the LICENSE file, including the 
main LICENSE file in opennlp folder?
- [n/a ] If applicable, have you updated the NOTICE file, including the 
main NOTICE file found in opennlp folder?

### For documentation related changes:
- [n/a ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thygesen/opennlp opennlp-1057

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #200


commit 940f8bc4a267ea24e7e235da2a32896c5271d132
Author: thygesen 
Date:   2017-05-15T18:07:14Z

OPENNLP-1057: Add all Eval tests to the Eval profile




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #199: OPENNLP-1059 Set model version before creating th...

2017-05-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/opennlp/pull/199


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Richard Eckart de Castilho
> On 15.05.2017, at 16:35, Joern Kottmann  wrote:
> 
> Richard, I believe I found the problem with the parser, would you mind to
> take a look?
> 
> This PR should fix it:
> https://github.com/apache/opennlp/pull/199

The parser test works nicely with the PR.

The lemmatizer test still behaves strange.

Cheers,

-- Richard



Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
Richard, I believe I found the problem with the parser, would you mind to
take a look?

This PR should fix it:
https://github.com/apache/opennlp/pull/199

Jörn

On Mon, May 15, 2017 at 4:14 PM, Richard Eckart de Castilho 
wrote:

> Hi Rodrigo,
>
> On 15.05.2017, at 15:36, Rodrigo Agerri  wrote:
> >
> > I cannot reproduce the lemmatizer issue. Could you please share your
> > training data?
>
> I have observed the change in behavior via the OpenNlpLemmatizerTrainerTest
> in DKPro Core [1]. It happens when I change the OpenNLP version in the POM
> from 1.7.2 to 1.8.0 (after including the OpenNLP staging Maven repo of
> course).
> Unfortunately, it's not a simple minimal OpenNLP-only unit test, but it
> makes used
> of the respective DKPro Core UIMA components.
>
> The data that is used is the GUM 3.0.0 corpus, specifically the CoNLL
> files in it [2].
>
> The corpus can be downloaded from: https://github.com/amir-
> zeldes/gum/archive/V3.0.0.zip
>
> Cheers,
>
> -- Richard
>
> [1] https://github.com/dkpro/dkpro-core/blob/
> 89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-
> asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/opennlp/
> OpenNlpLemmatizerTrainerTest.java
> [2] https://github.com/dkpro/dkpro-core/blob/master/dkpro-
> core-api-datasets-asl/src/main/resources/de/tudarmstadt/
> ukp/dkpro/core/api/datasets/lib/gum-en-conll-3.0.0.yaml


[GitHub] opennlp pull request #199: OPENNLP-1059 Set model version before creating th...

2017-05-15 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/199

OPENNLP-1059 Set model version before creating the POS Model

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1059

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/199.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #199


commit 108fa9a93c2cd126a138f8813390e197d0a3584e
Author: Jörn Kottmann 
Date:   2017-05-15T14:04:58Z

OPENNLP-1059 Set model version before creating the POS Model




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
Hello Richard,

thanks for reporting this. For 1.8.0 we replaced a Heap with a SortedSet
[1]. In this commit there is one loop [2] which iterates through the parses
which will be advanced. The order of the Parsers in the Heap was not so
well defined, therefore we decided to sort them by probability.
We also noticed that this change is changing the output of the parser with
the existing models in our SourceForge model eval test [3].

After running the evaluation on the OntoNotes4 data set I only got  very
small change and decided it is ok to do this. I am not aware of how big the
change is but is was less than the delta in test case [4] of 0.001.

What do you think? Should this be rolled back?

Anyway, that said, about the parser, I still need to understand what
happened with the lemmatizer.

Jörn

[1]
https://github.com/apache/opennlp/commit/3df659b9bfb02084e782f1e8b6ec716f56e0611c
[2]
https://github.com/apache/opennlp/blob/3df659b9bfb02084e782f1e8b6ec716f56e0611c/opennlp-tools/src/main/java/opennlp/tools/parser/AbstractBottomUpParser.java#L285
[3]
https://github.com/apache/opennlp/commit/3df659b9bfb02084e782f1e8b6ec716f56e0611c#diff-a5834f32b8a41b76a336126e4b13d4f7L349
[4]
https://github.com/apache/opennlp/blob/3df659b9bfb02084e782f1e8b6ec716f56e0611c/opennlp-tools/src/test/java/opennlp/tools/eval/OntoNotes4ParserEval.java#L70

On Sat, May 13, 2017 at 10:35 PM, Richard Eckart de Castilho  wrote:

> Hi all,
>
> > On 11.05.2017, at 18:37, Joern Kottmann  wrote:
> >
> > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> > 1.8.0 Release Candidate 2.
>
> Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same
> models are used during classification?
>
> E.g. the English parser model seems to create different POS tags now
> for the sentence "We need a very complicated example sentence ,
> which contains as many constituents and dependencies as possible .".
> "a" is now wrongly tagged as "," whereas 1.7.2 tagged it correctly as "DT".
>
> Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same
> training data is used during training?
>
> I have a test that trains a lemmatizer model on GUM 3.0.0. With 1.7.2,
> this model reached an f-score of ~0.96. With 1.8.0, I only get ~0.84.
>
> Cheers,
>
> -- Richard
>
>
>