Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2
With the issues reported by Richard we should cancel the vote and rollback the release. I change my vote to -1 (binding) 2017-05-13 19:08 GMT-03:00 Richard Eckart de Castilho: > > > On 13.05.2017, at 22:35, Richard Eckart de Castilho > wrote: > > > > Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same > > training data is used during training? > > > > I have a test that trains a lemmatizer model on GUM 3.0.0. With 1.7.2, > > this model reached an f-score of ~0.96. With 1.8.0, I only get ~0.84. > > Also, this test which trains and evaluates a lemmatizer model > takes ~8 sec with 1.7.2 and ~170 sec with 1.8.0. Even when only > considering the training phase (no evaluation), the test runs > much faster with 1.7.2 than with 1.8.0. > > Here are some details on the training phase. > > It seems odd that the events, outcomes, and predicates change that much. > > === 1.7.2 > > done. 50697 events > Indexing... done. > Sorting and merging events... done. Reduced 50697 events to 12675. > Done indexing. > Incorporating indexed data for training... > done. > Number of Event Tokens: 12675 > Number of Outcomes: 389 > Number of Predicates: 13488 > ...done. > Computing model parameters ... > Performing 10 iterations. > 1: ... loglikelihood=-302335.58198350534 0.8420616604532812 > 2: ... loglikelihood=-61602.20311717376 0.9492672150225852 > 3: ... loglikelihood=-30747.954089148297 0.9769217113438665 > 4: ... loglikelihood=-19986.853691639506 0.9850484249561118 > 5: ... loglikelihood=-14672.523462458894 0.9881255301102629 > 6: ... loglikelihood=-11572.587093608756 0.9893879322247865 > 7: ... loglikelihood=-9571.242700030467 0.9900783083811665 > 8: ... loglikelihood=-8185.39402892 0.9906897844053889 > 9: ... loglikelihood=-7174.66904253965 0.9912223602974535 > 10: ... loglikelihood=-6407.42781438460.9917746612225575 > > > === 1.8.0 > > done. 50697 events > Indexing... done. > Sorting and merging events... done. Reduced 50697 events to 26026. > Done indexing. > Incorporating indexed data for training... > done. > Number of Event Tokens: 26026 > Number of Outcomes: 7668 > Number of Predicates: 15279 > ...done. > Computing model parameters ... > Performing 10 iterations. > 1: ... loglikelihood=-453475.08854769287 1.972503303943034E-5 > 2: ... loglikelihood=-165718.68620632993 0.9509241177978973 > 3: ... loglikelihood=-85388.42871190465 0.9761327100222893 > 4: ... loglikelihood=-56404.00400621838 0.9892104069274316 > 5: ... loglikelihood=-41004.08840359108 0.9938457896916977 > 6: ... loglikelihood=-31539.64788603799 0.9955421425330887 > 7: ... loglikelihood=-25264.889481438582 0.9964889441189814 > 8: ... loglikelihood=-20883.72059438774 0.9972384953744797 > 9: ... loglikelihood=-17699.228362701586 0.9977710712665444 > 10: ... loglikelihood=-15306.654021266759 0.9980669467621358 > > > I also get some differences in f-score for other tests that train models, > but not as significant as when training a lemmatizer model. > > -- Richard >
Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2
Hi all, > On 11.05.2017, at 18:37, Joern Kottmannwrote: > > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP > 1.8.0 Release Candidate 2. Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same models are used during classification? E.g. the English parser model seems to create different POS tags now for the sentence "We need a very complicated example sentence , which contains as many constituents and dependencies as possible .". "a" is now wrongly tagged as "," whereas 1.7.2 tagged it correctly as "DT". Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same training data is used during training? I have a test that trains a lemmatizer model on GUM 3.0.0. With 1.7.2, this model reached an f-score of ~0.96. With 1.8.0, I only get ~0.84. Cheers, -- Richard
[GitHub] opennlp pull request #198: updated README.md
GitHub user beylerian opened a pull request: https://github.com/apache/opennlp/pull/198 updated README.md https://issues.apache.org/jira/browse/OPENNLP-1058 You can merge this pull request into a Git repository by running: $ git pull https://github.com/beylerian/opennlp OPENNLP-1058 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/opennlp/pull/198.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #198 commit bc97da728bbde15c15adc3c4df7424e3dbb227c9 Author: beylerianDate: 2017-05-13T17:37:01Z updated README.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2
+1 non-binding Download artifacts, built and executed unit tests successfully on Mac OS X 10.10.5. On 2017/05/12 1:37, Joern Kottmann wrote: The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP 1.8.0 Release Candidate 2. The RC 2 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-101 2/org/apache/opennlp/opennlp-distr/1.8.0/ The release was made from the Apache OpenNLP 1.8.0 tag at https://github.com/apache/opennlp/tree/opennlp-1.8.0 To use it in a maven build set the version for opennlp-tools or opennlp-uima to 1.8.0 and add the following URL to your settings.xml file: https://repository.apache.org/content/repositories/orgapacheopennlp-101 2 The release was made using the OpenNLP release process, documented on the Wiki here: https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process The release contains quite some changes, please refer to the contained issue list for details. Please vote on releasing these packages as Apache OpenNLP 1.8.0. The vote is open for at least the next 72 hours. Only votes from OpenNLP PMC are binding, but folks are welcome to check the release candidate and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache OpenNLP 1.8.0 [ ] -1 Do not release the packages because... Thanks! Jörn P.S. Here is my +1.