Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2
Hello Richard, I have tried with various corpora, including GUM, but I cannot reproduce that error. https://github.com/apache/opennlp/commit/8a3b3b537a30b14c4ffb5eb32ffa41 d5027bddad Please note that commit O-904 changed (broke) the lemmatizer API substantially to make it uniform between DictionaryLemmatizer and the LemmatizerME (e.g., doing the decoding of lemmas internally and so on) so that this line for tagging with the LemmatizerME is not required: https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpLemmatizer.java#L135 Also, that commit changed the LemmaSampleStream and LemmaSample classes, so it is possible that is affecting this class: https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/internal/CasLemmaSampleStream.java I understand the logic of this class correctly as it stands it will take an already encoded SES and will try to encoded it again? Could you please take a look and see if that could be the problem? Cheers, Rodrigo On Mon, May 15, 2017 at 6:21 PM, Richard Eckart de Castilhowrote: > > On 15.05.2017, at 16:35, Joern Kottmann wrote: > > > > Richard, I believe I found the problem with the parser, would you mind to > > take a look? > > > > This PR should fix it: > > https://github.com/apache/opennlp/pull/199 > > The parser test works nicely with the PR. > > The lemmatizer test still behaves strange. > > Cheers, > > -- Richard > >
[GitHub] opennlp pull request #201: Opennlp 1060
GitHub user kottmann opened a pull request: https://github.com/apache/opennlp/pull/201 Opennlp 1060 Thank you for contributing to Apache OpenNLP. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Does your PR title start with OPENNLP- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Is your initial contribution a single, squashed commit? ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kottmann/opennlp opennlp-1060 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/opennlp/pull/201.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #201 commit 54432f8bc660d0ce64acc6569609644ae9bb4848 Author: beylerianDate: 2017-05-13T17:37:01Z OPENNLP-1058: Update README.md to cover more closes #198 commit 8ae5107f77453f13959525edee8694a6fefac2e6 Author: Jörn Kottmann Date: 2017-05-15T19:54:40Z OPENNLP-1060: Fix computation of hash for the parser --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2
Good to hear, the parser eval test also had a bug (O-1060), we will fix this now as well before the next RC, this should prevent that this happens again. And thanks again for finding this! Now we need to find the problem with the lemmatizer before we can build the next RC. Jörn On Mon, May 15, 2017 at 6:21 PM, Richard Eckart de Castilhowrote: > > On 15.05.2017, at 16:35, Joern Kottmann wrote: > > > > Richard, I believe I found the problem with the parser, would you mind to > > take a look? > > > > This PR should fix it: > > https://github.com/apache/opennlp/pull/199 > > The parser test works nicely with the PR. > > The lemmatizer test still behaves strange. > > Cheers, > > -- Richard > >
[GitHub] opennlp pull request #200: OPENNLP-1057: Add all Eval tests to the Eval prof...
Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/200 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] opennlp pull request #198: OPENNLP-1058: updated README.md
Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/198 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] opennlp pull request #200: OPENNLP-1057: Add all Eval tests to the Eval prof...
GitHub user thygesen opened a pull request: https://github.com/apache/opennlp/pull/200 OPENNLP-1057: Add all Eval tests to the Eval profile Enabled all Eval tests --- Thank you for contributing to Apache OpenNLP. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [â ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [â ] Does your PR title start with OPENNLP- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [â ] Has your PR been rebased against the latest commit within the target branch (typically master)? - â[ ] Is your initial contribution a single, squashed commit? ### For code changes: - [â ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder? - [n/a ] Have you written or updated unit tests to verify your changes? - [ n/a] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [n/a ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder? - [n/a ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder? ### For documentation related changes: - [n/a ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/thygesen/opennlp opennlp-1057 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/opennlp/pull/200.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #200 commit 940f8bc4a267ea24e7e235da2a32896c5271d132 Author: thygesenDate: 2017-05-15T18:07:14Z OPENNLP-1057: Add all Eval tests to the Eval profile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] opennlp pull request #199: OPENNLP-1059 Set model version before creating th...
Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/199 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2
> On 15.05.2017, at 16:35, Joern Kottmannwrote: > > Richard, I believe I found the problem with the parser, would you mind to > take a look? > > This PR should fix it: > https://github.com/apache/opennlp/pull/199 The parser test works nicely with the PR. The lemmatizer test still behaves strange. Cheers, -- Richard
Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2
Richard, I believe I found the problem with the parser, would you mind to take a look? This PR should fix it: https://github.com/apache/opennlp/pull/199 Jörn On Mon, May 15, 2017 at 4:14 PM, Richard Eckart de Castilhowrote: > Hi Rodrigo, > > On 15.05.2017, at 15:36, Rodrigo Agerri wrote: > > > > I cannot reproduce the lemmatizer issue. Could you please share your > > training data? > > I have observed the change in behavior via the OpenNlpLemmatizerTrainerTest > in DKPro Core [1]. It happens when I change the OpenNLP version in the POM > from 1.7.2 to 1.8.0 (after including the OpenNLP staging Maven repo of > course). > Unfortunately, it's not a simple minimal OpenNLP-only unit test, but it > makes used > of the respective DKPro Core UIMA components. > > The data that is used is the GUM 3.0.0 corpus, specifically the CoNLL > files in it [2]. > > The corpus can be downloaded from: https://github.com/amir- > zeldes/gum/archive/V3.0.0.zip > > Cheers, > > -- Richard > > [1] https://github.com/dkpro/dkpro-core/blob/ > 89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp- > asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/opennlp/ > OpenNlpLemmatizerTrainerTest.java > [2] https://github.com/dkpro/dkpro-core/blob/master/dkpro- > core-api-datasets-asl/src/main/resources/de/tudarmstadt/ > ukp/dkpro/core/api/datasets/lib/gum-en-conll-3.0.0.yaml
[GitHub] opennlp pull request #199: OPENNLP-1059 Set model version before creating th...
GitHub user kottmann opened a pull request: https://github.com/apache/opennlp/pull/199 OPENNLP-1059 Set model version before creating the POS Model Thank you for contributing to Apache OpenNLP. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Does your PR title start with OPENNLP- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Is your initial contribution a single, squashed commit? ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kottmann/opennlp opennlp-1059 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/opennlp/pull/199.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #199 commit 108fa9a93c2cd126a138f8813390e197d0a3584e Author: Jörn KottmannDate: 2017-05-15T14:04:58Z OPENNLP-1059 Set model version before creating the POS Model --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2
Hello Richard, thanks for reporting this. For 1.8.0 we replaced a Heap with a SortedSet [1]. In this commit there is one loop [2] which iterates through the parses which will be advanced. The order of the Parsers in the Heap was not so well defined, therefore we decided to sort them by probability. We also noticed that this change is changing the output of the parser with the existing models in our SourceForge model eval test [3]. After running the evaluation on the OntoNotes4 data set I only got very small change and decided it is ok to do this. I am not aware of how big the change is but is was less than the delta in test case [4] of 0.001. What do you think? Should this be rolled back? Anyway, that said, about the parser, I still need to understand what happened with the lemmatizer. Jörn [1] https://github.com/apache/opennlp/commit/3df659b9bfb02084e782f1e8b6ec716f56e0611c [2] https://github.com/apache/opennlp/blob/3df659b9bfb02084e782f1e8b6ec716f56e0611c/opennlp-tools/src/main/java/opennlp/tools/parser/AbstractBottomUpParser.java#L285 [3] https://github.com/apache/opennlp/commit/3df659b9bfb02084e782f1e8b6ec716f56e0611c#diff-a5834f32b8a41b76a336126e4b13d4f7L349 [4] https://github.com/apache/opennlp/blob/3df659b9bfb02084e782f1e8b6ec716f56e0611c/opennlp-tools/src/test/java/opennlp/tools/eval/OntoNotes4ParserEval.java#L70 On Sat, May 13, 2017 at 10:35 PM, Richard Eckart de Castilhowrote: > Hi all, > > > On 11.05.2017, at 18:37, Joern Kottmann wrote: > > > > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP > > 1.8.0 Release Candidate 2. > > Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same > models are used during classification? > > E.g. the English parser model seems to create different POS tags now > for the sentence "We need a very complicated example sentence , > which contains as many constituents and dependencies as possible .". > "a" is now wrongly tagged as "," whereas 1.7.2 tagged it correctly as "DT". > > Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same > training data is used during training? > > I have a test that trains a lemmatizer model on GUM 3.0.0. With 1.7.2, > this model reached an f-score of ~0.96. With 1.8.0, I only get ~0.84. > > Cheers, > > -- Richard > > >