Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-13 Thread William Colen
With the issues reported by Richard we should cancel the vote and rollback
the release.

I change my vote to -1 (binding)

2017-05-13 19:08 GMT-03:00 Richard Eckart de Castilho :

>
> > On 13.05.2017, at 22:35, Richard Eckart de Castilho 
> wrote:
> >
> > Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same
> > training data is used during training?
> >
> > I have a test that trains a lemmatizer model on GUM 3.0.0. With 1.7.2,
> > this model reached an f-score of ~0.96. With 1.8.0, I only get ~0.84.
>
> Also, this test which trains and evaluates a lemmatizer model
> takes ~8 sec with 1.7.2 and ~170 sec with 1.8.0. Even when only
> considering the training phase (no evaluation), the test runs
> much faster with 1.7.2 than with 1.8.0.
>
> Here are some details on the training phase.
>
> It seems odd that the events, outcomes, and predicates change that much.
>
> === 1.7.2
>
> done. 50697 events
> Indexing...  done.
> Sorting and merging events... done. Reduced 50697 events to 12675.
> Done indexing.
> Incorporating indexed data for training...
> done.
> Number of Event Tokens: 12675
> Number of Outcomes: 389
>   Number of Predicates: 13488
> ...done.
> Computing model parameters ...
> Performing 10 iterations.
>   1:  ... loglikelihood=-302335.58198350534 0.8420616604532812
>   2:  ... loglikelihood=-61602.20311717376  0.9492672150225852
>   3:  ... loglikelihood=-30747.954089148297 0.9769217113438665
>   4:  ... loglikelihood=-19986.853691639506 0.9850484249561118
>   5:  ... loglikelihood=-14672.523462458894 0.9881255301102629
>   6:  ... loglikelihood=-11572.587093608756 0.9893879322247865
>   7:  ... loglikelihood=-9571.242700030467  0.9900783083811665
>   8:  ... loglikelihood=-8185.39402892  0.9906897844053889
>   9:  ... loglikelihood=-7174.66904253965   0.9912223602974535
>  10:  ... loglikelihood=-6407.42781438460.9917746612225575
>
>
> === 1.8.0
>
> done. 50697 events
> Indexing...  done.
> Sorting and merging events... done. Reduced 50697 events to 26026.
> Done indexing.
> Incorporating indexed data for training...
> done.
> Number of Event Tokens: 26026
> Number of Outcomes: 7668
>   Number of Predicates: 15279
> ...done.
> Computing model parameters ...
> Performing 10 iterations.
>   1:  ... loglikelihood=-453475.08854769287 1.972503303943034E-5
>   2:  ... loglikelihood=-165718.68620632993 0.9509241177978973
>   3:  ... loglikelihood=-85388.42871190465  0.9761327100222893
>   4:  ... loglikelihood=-56404.00400621838  0.9892104069274316
>   5:  ... loglikelihood=-41004.08840359108  0.9938457896916977
>   6:  ... loglikelihood=-31539.64788603799  0.9955421425330887
>   7:  ... loglikelihood=-25264.889481438582 0.9964889441189814
>   8:  ... loglikelihood=-20883.72059438774  0.9972384953744797
>   9:  ... loglikelihood=-17699.228362701586 0.9977710712665444
>  10:  ... loglikelihood=-15306.654021266759 0.9980669467621358
>
>
> I also get some differences in f-score for other tests that train models,
> but not as significant as when training a lemmatizer model.
>
> -- Richard
>


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-13 Thread Richard Eckart de Castilho
Hi all,

> On 11.05.2017, at 18:37, Joern Kottmann  wrote:
> 
> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> 1.8.0 Release Candidate 2. 

Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same
models are used during classification?

E.g. the English parser model seems to create different POS tags now
for the sentence "We need a very complicated example sentence , 
which contains as many constituents and dependencies as possible .".
"a" is now wrongly tagged as "," whereas 1.7.2 tagged it correctly as "DT".

Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same
training data is used during training?

I have a test that trains a lemmatizer model on GUM 3.0.0. With 1.7.2,
this model reached an f-score of ~0.96. With 1.8.0, I only get ~0.84.

Cheers,

-- Richard




[GitHub] opennlp pull request #198: updated README.md

2017-05-13 Thread beylerian
GitHub user beylerian opened a pull request:

https://github.com/apache/opennlp/pull/198

updated README.md

https://issues.apache.org/jira/browse/OPENNLP-1058

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/beylerian/opennlp OPENNLP-1058

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #198


commit bc97da728bbde15c15adc3c4df7424e3dbb227c9
Author: beylerian 
Date:   2017-05-13T17:37:01Z

updated README.md




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-13 Thread Koji Sekiguchi

+1 non-binding

Download artifacts, built and executed unit tests successfully on Mac OS X 
10.10.5.


On 2017/05/12 1:37, Joern Kottmann wrote:

The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
1.8.0 Release Candidate 2.

The RC 2 distributables can be downloaded from here:
https://repository.apache.org/content/repositories/orgapacheopennlp-101
2/org/apache/opennlp/opennlp-distr/1.8.0/

The release was made from the Apache OpenNLP 1.8.0 tag at
https://github.com/apache/opennlp/tree/opennlp-1.8.0

To use it in a maven build set the version for opennlp-tools or
opennlp-uima to 1.8.0 and add the following URL to your settings.xml
file:
https://repository.apache.org/content/repositories/orgapacheopennlp-101
2

The release was made using the OpenNLP release process, documented on
the Wiki here:
https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process

The release contains quite some changes, please refer to the contained
issue list for details.

Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
vote is open for at least the next 72 hours.

Only votes from OpenNLP PMC are binding, but folks are welcome to check
the release candidate and voice their approval or disapproval. The vote
passes if at least three binding +1 votes are cast.

[ ] +1 Release the packages as Apache OpenNLP 1.8.0
[ ] -1 Do not release the packages because...


Thanks!

Jörn

P.S. Here is my +1.