Re: [VOTE] Apache OpenNLP 2.3.3 (rc1)

2024-04-22 Thread Tommaso Teofili
+1

tag builds ok, sigs ok.

Regards,
Tommaso

On Mon, 22 Apr 2024 at 08:56, Bruno Kinoshita 
wrote:

> +1
>
> Tag is building fine on my env:
>
> Apache Maven 3.8.5 (3599d3414f046de2324203b78ddcf9b5e4388aa0)
> Maven home: /opt/apache-maven-3.8.5
> Java version: 17.0.10, vendor: Private Build, runtime:
> /usr/lib/jvm/java-17-openjdk-amd64
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "5.15.0-105-generic", arch: "amd64", family:
> "unix"
>
> Thank you!
>
> Bruno
>
> On Sun, 21 Apr 2024 at 10:14, Martin Wiesner  wrote:
>
> > Hi folks,
> >
> > I have posted a 1st release candidate (rc1) for the Apache OpenNLP 2.3.3
> > release and it is ready for testing.
> >
> > This release brings four dependency updates, two bug fixes, minor
> > corrections in the manual, and working integration tests (IT) again!
> > The ITs were not executed for quite some time, but are executed for every
> > regular Maven build now; future work should include introducing a
> separate
> > Maven profile for executing ITs.
> > The manual's CSS got modernized; a preview can be found here:
> > https://github.com/apache/opennlp/pull/591
> > Moreover, this release will ship an abbreviation dictionary for the Dutch
> > language.
> >
> > Thank you to everyone who contributed to this release, including all of
> > our users and the people who submitted bug reports, contributed code or
> > documentation enhancements.
> >
> > The release was made using the OpenNLP release process, documented on the
> > website:
> > https://opennlp.apache.org/release.html
> >
> > Maven Repo:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1038
> >
> > 
> > 
> > opennlp-2.3.3-rc1
> > Testing OpenNLP 2.3.3 release candidate
> > 
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1038
> > 
> > 
> > 
> >
> > Binaries & Source:
> >
> > https://dist.apache.org/repos/dist/dev/opennlp/opennlp-2.3.3
> >
> > Tag:
> >
> > https://github.com/apache/opennlp/releases/tag/opennlp-2.3.3
> >
> > Release notes:
> >
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311215&version=12354199
> >
> > The results of the eval tests for the aforementioned tag can be found
> > here:
> https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-releases/12
> >
> > Reminder: The up-2-date KEYS file for signature verification can be
> > found here: https://dist.apache.org/repos/dist/release/opennlp/KEYS
> >
> > Please vote on releasing these packages as Apache OpenNLP 2.3.3.
> > The vote is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but everyone is welcome to check
> > the release candidate and vote.
> > The vote passes if at least three binding +1 votes are cast.
> >
> > Please VOTE
> >
> > [+1] go ship it
> > [+0] meh, don't care
> > [-1] stop, there is a ${showstopper}
> >
> > Thanks!
> >
> > Martin | mawiesne
> >
> >
>


Re: [VOTE] Apache OpenNLP 2.3.1 Release Candidate

2023-11-23 Thread Tommaso Teofili
+1

Tommaso

On Thu, 23 Nov 2023 at 12:05, Richard Zowalla  wrote:

> +1 (binding)
>
>
> (We should create an issue for the year in the NOTICE file though)
>
> Am Mittwoch, dem 22.11.2023 um 15:12 +0100 schrieb Martin Wiesner:
> >
> > Hi folks,
> >
> > I have posted a 1st release candidate for the Apache OpenNLP 2.3.1
> > release and it is ready for testing.
> >
> > It is a maintenance release which provides some enhancements.
> > Some of these are related to sentences models and the use of
> > abbreviations, see OPENNLP-570 & OPENNLP-793.
> > Moreover, it switches the ONNX runtime for the 'opennlp-dl' component
> > from the GPU to the CPU-based variant, see OPENNLP-1515.
> > Several other (cleanup) tasks have also been completed.
> >
> > Thank you to everyone who contributed to this release, including all
> > of our users and the people who submitted bug reports, contributed
> > code or documentation enhancements.
> >
> > The release was made using the OpenNLP release process, documented on
> > the website:
> > https://opennlp.apache.org/release.html
> >
> > Maven Repo:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1035
> >
> > 
> > 
> > opennlp-2.3.1-rc1
> > Testing OpenNLP 2.3.1 release candidate
> > 
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1035
> > 
> > 
> > 
> >
> > Binaries & Source:
> >
> > https://dist.apache.org/repos/dist/dev/opennlp/opennlp-2.3.1
> >
> > Tag:
> >
> > https://github.com/apache/opennlp/releases/tag/opennlp-2.3.1
> >
> > Release notes:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311215&version=12353478
> >
> > The results of the eval tests for the aforementioned tag can be found
> > here:
> > https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-releases/9/
> >
> > Reminder: The up-2-date KEYS file for signature verification can be
> > found here: https://dist.apache.org/repos/dist/release/opennlp/KEYS
> >
> > Please vote on releasing these packages as Apache OpenNLP 2.3.1. The
> > vote is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but everyone is welcome to
> > check the release candidate and vote.
> > The vote passes if at least three binding +1 votes are cast.
> >
> > Please VOTE
> >
> > [+1] go ship it
> > [+0] meh, don't care
> > [-1] stop, there is a ${showstopper}
> >
> > Thanks!
> > mawiesne
>
>


Re: OpenNLP 2.3.0 ?

2023-07-25 Thread Tommaso Teofili
+1 for me too

Tommaso

On Tue, 25 Jul 2023 at 16:04, Martin Wiesner 
wrote:

>
> I'm also +1.
>
> Best
> Martin
> --
> Am Dienstag, Juli 25, 2023 13:56 CEST, schrieb Bruno Kinoshita <
> brunodepau...@gmail.com>:
>  Not from me, +1 !
>
> On Tue, 25 Jul 2023 at 10:49, Richard Zowalla  wrote:
>
> > Hi all,
> >
> > any objections in doing OpenNLP 2.3.0 after we have merged the memory
> > improvement by Martin?
> >
> > We have some dependency updates, resource leak fixes and Java 17.
> >
> > Thoughts?
> >
> > Gruß
> > Richard
> >
>
>
>
> -- 
> __
> Colorful Bytes UG (haftungsbeschränkt)
> Blumenstraße 20
> 74906 Bad Rappenau
> Sitz: Bad Rappenau
> Registergericht Stuttgart
> HRB 758952
> USt-IdNr.: DE309647220
>  Geschäftsführung:
> Dr. Monika Pobiruchin
> Martin Wiesner
> Dr. Richard Zowalla
> Daniel ZsebeditsWebsite
> E-Mail
> Telefon
> © 2023 Colorful Bytes UG (haftungsbeschr.)
>


Re: Renaming of master branch to main

2022-12-15 Thread Tommaso Teofili
+1 from me as well.

Tommaso

On Thu, 15 Dec 2022 at 11:58, Richard Zowalla  wrote:
>
> Hi,
>
> I am +1 for that one.
>
> Most of the mechanical work is done by INFRA.
> We only need to adjust build bot and github actions
>
> Gruß
> Richard
>
> On 2022/07/28 13:22:29 Jeff Zemerick wrote:
> > Hi all,
> >
> > I want to see about the community's input as to renaming the master branch
> > to main for the opennlp and opennlp-site repositories. I see quite a few
> > ASF projects have made the change and I personally think we should, too,
> > for the same reason of promoting inclusivity.
> >
> > I think we would have to update the target branch of the outstanding PRs,
> > update the build config, and perhaps let Infra know so BuiltBot for
> > opennlp-site will keep working. Anything else?
> >
> > Thoughts? Does this require a vote?
> >
> > Thanks,
> > Jeff
> >


Re: OpenNLP 2.0 release discussion

2022-04-06 Thread Tommaso Teofili
+1

Tommaso

On Wed, 6 Apr 2022 at 04:00, Bruno P. Kinoshita
 wrote:

>  +1 Jeff, thanks!
>
> Bruno
>
> On Wednesday, 6 April 2022, 02:34:35 am NZST, Jeff Zemerick <
> jzemer...@apache.org> wrote:
>
>  Hi all,
>
> I would like to propose an OpenNLP 2.0 release for the following reasons:
>
> - There are a few significant changes: Building using Java 11, support for
> ONNX models, automatic model downloading
> - User activity has been somewhat low and a 2.0 release might help bring
> attention to these new features.
> - 1.x has been around for 10+ years. :)
> - Other reasons?
>
> Thoughts? Concerns?
>
> Thanks,
> Jeff
>
>
> Our current master branch has the following changes:
>
> Bug
> [OPENNLP-1353] - DictonaryLemmatizer missing charset
>
> Improvement
> [OPENNLP-565] - Add MASC format support
> [OPENNLP-1185] - Tokenizers should be able to output a new line token
> [OPENNLP-1306] - NameSample overlap exception not helpful
>
> Task
> [OPENNLP-1318] - Add ability to download models from within OpenNLP
> [OPENNLP-1351] - Support ONNX models
> [OPENNLP-1354] - Change build to use Java 11
> [OPENNLP-1355] - Document ONNX capability introduced in OPENNLP-1351
> [OPENNLP-1356] - Document the ONNX implementations
> [OPENNLP-1359] - Build fails with Java 17
> [OPENNLP-1364] - Move setKeepNewLines to the Tokenizer class
>
> Documentation
> [OPENNLP-1319] - The Training API code is outdated in Manual
>


Re: [VOTE] Apache OpenNLP Models 1.0

2021-05-12 Thread Tommaso Teofili
Hi all,

sorry for the delay, I've finally managed to have a look at the artifacts
and they look good to me (sigs, etc.).

+1 from me

Regards,
Tommaso

On Sun, 25 Apr 2021 at 10:10, Bruno P. Kinoshita  wrote:

>  Hi Jeff,
>
> Sorry for the delay on voting. From what I recall, one issue raised in the
> last thread I think was the location of the models. These looks to be
> hosted at ASF hosted dist folders, so shouldn't have any more issues.
>
> Checked signatures and found no errors. Looked at each text file int he
> dist area and everything looks good. Verified the mentioned files (logs
> archive, example models, etc) all either existed or had no issues.
> Everything OK.
>
> The NOTICE file says 2017, and in Commons we try to update that to the
> year of the release of the component, but that shouldn't be a blocker I
> think.
>
> +1
>
> Thank you!
> Bruno
>
> On Friday, 23 April 2021, 4:42:41 am NZST, Jeff Zemerick <
> jzemer...@apache.org> wrote:
>
>  Hi all,
>
> Just a reminder that this vote thread is still active.
>
> Thanks,
> Jeff
>
>
> On Wed, Apr 14, 2021 at 10:41 AM Jeff Zemerick 
> wrote:
>
> > All,
> >
> > I am calling a vote to release the Apache OpenNLP models trained on the
> > Universal Dependencies corpus.
> >
> > The models and text files showing training and evaluation results along
> > with information about how the models were trained are available at:
> > https://dist.apache.org/repos/dist/dev/opennlp/ud-models-1.0/
> >
> > Upon a successful vote the models will be made available on the Apache
> > OpenNLP website on the Models Download page (
> > https://opennlp.apache.org/models.html). Work will then continue on a
> > pull request to modify OpenNLP to be able to automatically download and
> use
> > these pre-trained models when a local model is not provided by the user.
> > (The goal being to lower the barrier of entry into OpenNLP and make its
> use
> > more convenient.)
> >
> > Please vote on releasing the models as model version 1.0. The vote is
> open
> > for at least the next 72 hours.
> >
> > Only votes from the OpenNLP PMC are binding but everyone is welcome to
> > check the models and vote. The vote passes if at least three binding +1
> > votes are cast.
> >
> > [ ] +1 Release the models as version 1.0.
> > [ ] -1 Do not release the models as 1.0 because...
> >
> > Thanks!
> > Jeff
> >
> >
>


Re: [VOTE] Release OpenNLP Models trained on UD

2021-03-12 Thread Tommaso Teofili
+1

Tommaso

On Sat, 13 Mar 2021 at 08:25, Bruno P. Kinoshita  wrote:

>  [x] +1 Release the models as model version 1.0.
>
>
> On Saturday, 13 March 2021, 2:39:37 am NZDT, Jeff Zemerick <
> jzemer...@apache.org> wrote:
>
>  All,
>
> I am calling a vote to release the Apache OpenNLP models trained on the
> Universal Dependencies corpus. The models were described in a previous
> thread you can see at
> https://www.mail-archive.com/dev@opennlp.apache.org/msg03054.html.
>
> This vote is to release the models as version 1.0. (The models are still
> available in the Dropbox folder at
>
> https://www.dropbox.com/sh/p8focuz0qwvw84b/AAC6GqO8mqZn_xkAqHZsVAsoa?dl=0&lst=
> along with text files showing the training and evaluation results).
>
> Upon a successful vote the models will be made available on the Apache
> OpenNLP website on the Models Download page (
> https://opennlp.apache.org/models.html). Work will then continue to modify
> OpenNLP to be able to automatically download and use these pre-trained
> models when none is specifically loaded by the user. (The goal being to
> lower the barrier of entry into OpenNLP and make its use more convenient.)
>
> Please vote on releasing the models in the linked DropBox folder as model
> version 1.0. The vote is open for at least the next 72 hours.
>
> Only votes from the OpenNLP PMC are binding but everyone is welcome to
> check the models and vote. The vote passes if at least three binding +1
> votes are cast.
>
> [ ] +1 Release the models as model version 1.0.
> [ ] -1 Do not release the models because...
>
> Thanks!
> Jeff
>


Re: [VOTE] Apache OpenNLP 1.9.3 Release Candidate

2020-07-29 Thread Tommaso Teofili
+1 from me, build, sigs, tag look good.

Regards,
Tommaso

On Tue, 28 Jul 2020 at 10:48, Bruno P. Kinoshita  wrote:

> It worked after I imported keys from
> https://dist.apache.org/repos/dist/release/opennlp/KEYS
>
> [x] +1 Release the packages as Apache OpenNLP 1.9.3
>
>
> Thanks!
> Bruno
>
>
> On Monday, 27 July 2020, 12:00:29 am NZST, Jeff Zemerick <
> jzemer...@apache.org> wrote:
>
>
>
>
>
> Looks like I'm in there as jzemerick. See if I'm doing this correctly:
>
> wget https://people.apache.org/keys/group/opennlp.asc
> gpg --import https://people.apache.org/keys/group/opennlp.asc
>
> wget
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1027/org/apache/opennlp/opennlp-distr/1.9.3/opennlp-distr-1.9.3-bin.tar.gz
> wget
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1027/org/apache/opennlp/opennlp-distr/1.9.3/opennlp-distr-1.9.3-bin.tar.gz.asc
>
> gpg --verify opennlp-distr-1.9.3-bin.tar.gz.asc
> gpg: assuming signed data in 'opennlp-distr-1.9.3-bin.tar.gz'
> gpg: Signature made Fri Jul 24 15:21:24 2020 UTC
> gpg:using RSA key 6786BCFFBD2AE66E737FE97760E63AD841EF12D8
> gpg: Good signature from "Jeff Zemerick (CODE SIGNING KEY) <
> jzemer...@apache.org>" [unknown]
> gpg: WARNING: This key is not certified with a trusted signature!
> gpg:  There is no indication that the signature belongs to the
> owner.
> Primary key fingerprint: 6786 BCFF BD2A E66E 737F  E977 60E6 3AD8 41EF 12D8
>
> Jeff
>
>
> On Sun, Jul 26, 2020 at 5:25 AM Bruno P. Kinoshita 
> wrote:
>
> > Hi,
> >
> >
> > Built successfully from tag with Java 8 on Ubuntu LTS. Had a look at one
> > file from the dist area, and the contents looked OK (license, notice,
> jars
> > were using the right version 1.9.3 too).
> >
> >
> > Also checked the signatures using some shell script I normally use, but
> it
> > failed to validate. I think it failed to find your key in
> > https://people.apache.org/keys/group/opennlp.asc. Have you added your
> key
> > there? I search for Jeff and jzonthemtn, but couldn't find it.
> >
> >
> > Cheers
> >
> > Bruno
> >
> >
> >
> > On Saturday, 25 July 2020, 11:08:12 pm NZST, Jeff Zemerick <
> > jzemer...@apache.org> wrote:
> >
> >
> >
> >
> >
> > Hi folks,
> >
> > I have posted a 1st release candidate for the Apache OpenNLP 1.9.3
> release
> > and it is ready for testing.
> >
> > The distributables can be downloaded from:
> >
> >
> https://repository.apache.org/content/repositories/orgapacheopennlp-1027/org/apache/opennlp/opennlp-distr/1.9.3/
> >
> > The release was made from the Apache OpenNLP 1.9.3 tag at:
> > https://github.com/apache/opennlp/tree/opennlp-1.9.3
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.9.3 and add the following URL to your settings.xml
> file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1027
> >
> > The release was made using the OpenNLP release process, documented on the
> > website:
> > https://opennlp.apache.org/release.html
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.9.3. The vote
> > is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but everyone is welcome to check
> > the release candidate and vote.
> > The vote passes if at least three binding +1 votes are cast.
> >
> > [ ] +1 Release the packages as Apache OpenNLP 1.9.3
> > [ ] -1 Do not release the packages because...
> >
> > Thanks!
> >
> > Jeff
> >
>


Re: [VOTE] Apache OpenNLP 1.9.2 Release Candidate

2019-12-23 Thread Tommaso Teofili
+1 (binding)

tag build succeeds (jdk 8), signatures ok.

Regards,
Tommaso

On Mon, 23 Dec 2019 at 13:32, Jeff Zemerick  wrote:

> +1 binding
>
> verified signatures
> built and tested from opennlp-1.9.2 tag using openjdk 8
>
> On Fri, Dec 20, 2019 at 11:07 AM Jeff Zemerick 
> wrote:
>
> > Hi folks,
> >
> > I have posted a 1st release candidate for the Apache OpenNLP 1.9.2
> release
> > and it is ready for testing.
> >
> > The distributables can be downloaded from:
> >
> >
> https://repository.apache.org/content/repositories/orgapacheopennlp-1026/org/apache/opennlp/opennlp-distr/1.9.2/
> >
> > The release was made from the Apache OpenNLP 1.9.2 tag at:
> > https://github.com/apache/opennlp/tree/opennlp-1.9.2
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.9.2 and add the following URL to your settings.xml
> file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1026
> >
> > The release was made using the OpenNLP release process, documented on the
> > website:
> > https://opennlp.apache.org/release.html
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.9.2. The vote
> > is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but everyone is welcome to check
> > the release candidate and vote.
> > The vote passes if at least three binding +1 votes are cast.
> >
> > [ ] +1 Release the packages as Apache OpenNLP 1.9.2
> > [ ] -1 Do not release the packages because...
> >
> > Thanks!
> >
> > Jeff
> >
>


Re: OpenNLP 1.9.2 and Java 8/11

2019-12-14 Thread Tommaso Teofili
+1

Thanks Jeff!

Il giorno sab 14 dic 2019 alle 15:32 Jeff Zemerick 
ha scritto:

> During preparation for a 1.9.2 release it was noticed that the current
> master branch fails a few of the regression tests when built using OpenJDK
> 11. (All tests pass when using OpenJDK 8.) Unless there are any significant
> objections, the 1.9.2 release will be built using OpenJDK 8 and the task
> [1] to address the failing regression tests on OpenJDK 11 will be addressed
> in the next minor release.
>
> Thanks,
> Jeff
>
> [1] https://issues.apache.org/jira/browse/OPENNLP-1285
>


Re: [VOTE] Apache OpenNLP 1.9.0 Release Candidate 2

2018-07-02 Thread Tommaso Teofili
+1
Il giorno lun 2 lug 2018 alle 10:34 Rodrigo Agerri 
ha scritto:

> +1
>
> Rodrigo
>
> On Sun, Jul 1, 2018 at 12:42 AM, Koji Sekiguchi
>  wrote:
> > I tested mvn install and some Eval tests (OntoNotes4NameFinderEval,
> > Conll02NameFinderEval, OntoNotes4PosTaggerEval) which use
> > FeatureGeneratorUtil.
> >
> > +1
> >
> > Koji
> >
> >
> >
> > On 2018/06/29 20:45, Jeff Zemerick wrote:
> >>
> >> Hi folks,
> >>
> >> I have posted a 2nd release candidate for the Apache OpenNLP 1.9.0
> release
> >> and it is ready for testing.
> >>
> >> The distributables can be downloaded from:
> >>
> >>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1022/org/apache/opennlp/opennlp-distr/1.9.0/
> >>
> >> The release was made from the Apache OpenNLP 1.9.0 RC2 tag at:
> >> https://github.com/apache/opennlp/tree/opennlp-1.9.0-rc2
> >>
> >> To use it in a maven build set the version for opennlp-tools or
> >> opennlp-uima to 1.9.0 and add the following URL to your settings.xml
> file:
> >>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1022
> >>
> >> The release was made using the OpenNLP release process, documented on
> the
> >> website:
> >> https://opennlp.apache.org/release.html
> >>
> >> Please vote on releasing these packages as Apache OpenNLP 1.9.0. The
> vote
> >> is open for at least the next 72 hours.
> >>
> >> Only votes from OpenNLP PMC are binding, but everyone is welcome to
> check
> >> the release candidate and vote.
> >> The vote passes if at least three binding +1 votes are cast.
> >>
> >> [ ] +1 Release the packages as Apache OpenNLP 1.9.0
> >> [ ] -1 Do not release the packages because...
> >>
> >> Thanks!
> >> Jeff
> >>
> >
>


NLP - OSS workshop @ ACL 2018

2018-03-15 Thread Tommaso Teofili
Hi all (sorry for cross posting),

this year ACL 2018 [1] will run a workshop on NLP in the open source [2].
CFP is open until March 25th, you can find the list of topics at [3].
Here's the list of invited speakers:
- Christopher Manning, Stanford University
- Matthew Honnibal and Ines Montani, Explosion AI
- Joel Nothman, University of Sydney

I think it'd be an interesting opportunity for projects at ASF to share
interesting work and experience in the NLP OSS space.

Regards,
Tommaso

[1] : http://acl2018.org/
[2] : http://nlposs.github.io
[3] : https://nlposs.github.io/#cfp


Re: [DISCUSS] - (ONIP-1) Better language model support

2018-02-01 Thread Tommaso Teofili
ok Suneel, I'll put down a more detailed design on a gdoc and share it here
as soon as I have it.

Regards,
Tommaso

Il giorno sab 27 gen 2018 alle ore 18:32 Suneel Marthi <
suneel.mar...@gmail.com> ha scritto:

> Thanks Tommaso.
>
> Could u share a google doc with the design, we can post the same onto the
> Wiki after the Google doc's been finalized.
>
> Its easier to comment on and make changes to a Google doc.
>
> On Sat, Jan 27, 2018 at 9:50 AM, Tommaso Teofili <
> tommaso.teof...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > recently I've created
> > https://cwiki.apache.org/confluence/display/OPENNLP/
> > ONIP-1+Better+language+model+support
> > as
> > a description of possible useful improvements to our ngram language model
> > implementation.
> > Feedback welcome.
> >
> > Regards,
> > Tommaso
> >
> > p.s.:
> > we created a wiki page containing possible such improvements at
> > https://cwiki.apache.org/confluence/display/OPENNLP/
> > OpenNLP+Improvement+Proposals,
> > feel free to create other proposals
> >
>


[DISCUSS] - (ONIP-1) Better language model support

2018-01-27 Thread Tommaso Teofili
Hi all,

recently I've created
https://cwiki.apache.org/confluence/display/OPENNLP/ONIP-1+Better+language+model+support
as
a description of possible useful improvements to our ngram language model
implementation.
Feedback welcome.

Regards,
Tommaso

p.s.:
we created a wiki page containing possible such improvements at
https://cwiki.apache.org/confluence/display/OPENNLP/OpenNLP+Improvement+Proposals,
feel free to create other proposals


Re: [VOTE] Apache OpenNLP 1.8.4 Release Candidate

2017-12-21 Thread Tommaso Teofili
+1 build ok, tag ok, sigs ok

Tommaso

Il giorno gio 21 dic 2017 alle ore 17:35 Dan Russ  ha
scritto:

> [ X] +1 Release the packages as Apache OpenNLP 1.8.4
>
> > On Dec 21, 2017, at 9:44 AM, Jeff Zemerick  wrote:
> >
> > Hi Folks,
> >
> > I have posted a first release candidate for the Apache OpenNLP 1.8.4
> > release and it is ready for testing.
> >
> > The RC1 distributables can be downloaded from here:
> >
> https://repository.apache.org/content/repositories/orgapacheopennlp-1020/org/apache/opennlp/opennlp-distr/1.8.4
> >
> > The release was made from the Apache OpenNLP 1.8.4 tag at
> > https://github.com/apache/opennlp/tree/opennlp-1.8.4
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.8.4 and add the following URL to your settings.xml
> file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1020
> >
> > The release was made using the OpenNLP release process, documented on the
> > Wiki here:
> > https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
> >
> > The release contains quite some changes, please refer to the contained
> > issue list for details.
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.8.4. The
> vote is
> > open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but folks are welcome to check
> the
> > release candidate and voice their approval or disapproval. The vote
> passes
> > if at least three binding +1 votes are cast.
> >
> > [ ] +1 Release the packages as Apache OpenNLP 
> > [ ] -1 Do not release the packages because...
> >
> > Thanks!
> > Jeff Zemerick
>
>


Re: [VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 3

2017-10-31 Thread Tommaso Teofili
+1 (binding)

Tommaso

Il giorno mar 31 ott 2017 alle ore 10:45 Suneel Marthi <
suneel.mar...@gmail.com> ha scritto:

> +1 binding
>
> Sent from my iPhone
>
> > On Oct 31, 2017, at 3:04 PM, Rodrigo Agerri 
> wrote:
> >
> > +1
> >
> > Rodrigo
> >
> > On Tue, Oct 31, 2017 at 2:37 AM, Koji Sekiguchi
> >  wrote:
> >> +1
> >>
> >> - checked text files in the zipped model file
> >> - verified signatures
> >> - executed LanguageDetector using the model file
> >>
> >> Koji
> >>
> >>
> >>> On 2017/10/30 22:30, William Colen wrote:
> >>>
> >>> The Apache OpenNLP PMC would like to call for a Vote on the Language
> >>> Detector model for Apache OpenNLP 1.8.3 Release Candidate 3.
> >>>
> >>> The Release artifacts can be downloaded from:
> >>>
> >>> http://people.apache.org/~colen/models/langdetect-183/rc3/
> >>>
> >>> The model was built with Apache OpenNLP 1.8.3 release, trained with a
> >>> portion of the Leipzig corpus, which can be found under this  tag:
> >>>
> >>> https://svn.apache.org/repos/bigdata/opennlp/tags/langdetect-183_RC3
> >>>
> >>> The model binary includes the NOTICE, LICENSE and also a README with
> >>> details of supported languages, how the Leipzig corpus was created and
> the
> >>> model was trained. For your convenience the README is available here:
> >>>
> >>>
> >>>
> https://svn.apache.org/repos/bigdata/opennlp/tags/langdetect-183_RC3/leipzig/resources/README.txt
> >>>
> >>> A detailed evaluation report is available here:
> >>>
> >>>
> >>>
> http://people.apache.org/~colen/models/langdetect-183/rc3/langdetect-183.bin.report.txt
> >>>
> >>> To use Language Detector, please follow the documentation here:
> >>>
> >>>
> http://opennlp.apache.org/docs/1.8.3/manual/opennlp.html#tools.langdetect
> >>>
> >>> It is important to note that this model is trained for and works well
> with
> >>> longer texts that have at least 2 sentences or more from the same
> >>> language.
> >>>
> >>> The artifacts have been signed with the Key - 524A9649
> >>>  found at
> >>>
> >>> http://people.apache.org/keys/group/opennlp.asc
> >>>
> >>> Please vote on releasing the model as Apache OpenNLP Language Detector
> >>> Model 1.8.3. The vote is open for either the next 72 hours or a
> minimum of
> >>> 3 +1 PMC binding votes
> >>> whichever happens earlier.
> >>>
> >>> Only votes from OpenNLP PMC are binding, but folks are welcome to check
> >>> the
> >>> release candidate and voice their approval or disapproval. The vote
> passes
> >>> if at least three binding +1 votes are cast.
> >>>
> >>> [ ] +1 Release the packages as Apache OpenNLP Language Detector Model
> >>> 1.8.3
> >>>
> >>> [ ] -1 Do not release the packages because...
> >>>
> >>> Thanks again to all the committers and contributors for their work over
> >>> the
> >>> past few weeks.
> >>>
> >>
>


Re: [VOTE] Apache OpenNLP 1.8.3 Release Candidate

2017-10-25 Thread Tommaso Teofili
+1 (binding)

- source build from tag ok
- sigs and checks ok

Il giorno mer 25 ott 2017 alle ore 18:09 Steve Blackmon <
sblack...@apache.org> ha scritto:

>  +1 non-binding
>
> - source builds, tests pass
> - verified checksums and signatures
>
> Steve Blackmon
> sblack...@apache.org
>
> On Oct 25, 2017 at 10:17 AM, Dan Russ  wrote:
>
>
> +1 burrito
>
> ran units test on my downstream code that uses opennlp-tools.
>
> On Oct 25, 2017, at 6:58 AM, Suneel Marthi  wrote:
>
> +1 binding
>
> 1. Verified Sigs and hashes
> 2. Ran a clean build from {src} * {zip, tar}
> 3. All unit tests pass
>
> On Wed, Oct 25, 2017 at 3:08 PM, Bruno P. Kinoshita <
> brunodepau...@yahoo.com.br.invalid> wrote:
>
> [ X ] +1 Release the packages as Apache OpenNLP 1.8.3
>
> `mvn clean test install` working fine, checked artefacts signatures,
> matching with what was in the vote e-mail.
>
> Currently on tag 1.8.3, commit b317159cb9857dc509c08a31a98dc61209f39bff
>
> Thanks for preparing this release.
>
> Cheers
> Bruno
>
>
>
> 
> From: Suneel Marthi 
> To: dev@opennlp.apache.org; us...@opennlp.apache.org
> Sent: Tuesday, 24 October 2017 10:29 PM
> Subject: [VOTE] Apache OpenNLP 1.8.3 Release Candidate
>
>
>
> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
>
> 1.8.3 Release Candidate.
>
>
> The Release artifacts can be downloaded from:
>
>
> https://repository.apache.org/content/repositories/orgapache
>
> opennlp-1010/org/apache/opennlp/opennlp-distr/1.7.2/
>
>
> The release was made from the Apache OpenNLP 1.8.3 tag at
>
>
> https://github.com/apache/opennlp/tree/opennlp-1.8.3
>
>
> To use it in a maven build set the version for opennlp-tools or
> opennlp-uima
>
> to 1.8.3
>
>
> and add the following URL to your settings.xml file:
>
>
> https://repository.apache.org/content/repositories/
> orgapacheopennlp-1019/org/apache/opennlp/opennlp-distr/1.8.3/
>
>
> The artifacts have been signed with the Key - D3541808 found at
>
>
> http://people.apache.org/keys/group/opennlp.asc
>
>
> Please vote on releasing these packages as Apache OpenNLP 1.8.3. The vote
> is
>
>
> open for either the next 72 hours or a minimum of 3 +1 PMC binding votes
>
> whichever happens earlier.
>
>
> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
>
>
> release candidate and voice their approval or disapproval. The vote passes
>
>
> if at least three binding +1 votes are cast.
>
>
> [ ] +1 Release the packages as Apache OpenNLP 1.8.3
>
>
> [ ] -1 Do not release the packages because...
>
>
> Thanks again to all the committers and contributors for their work
>
> over the past
>
> few weeks.
>


Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate 2

2017-09-11 Thread Tommaso Teofili
+1

Tommaso

Il giorno lun 11 set 2017 alle ore 09:12 Joern Kottmann 
ha scritto:

> Hi Folks,
>
>
> I have posted a second release candidate for the Apache OpenNLP 1.8.2
> release and it is ready for testing.
>
>
> The RC 2 distributables can be downloaded from here:
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1018/org/apache/opennlp/opennlp-distr/1.8.2/
>
>
> The release was made from the Apache OpenNLP 1.8.2 tag at
> https://github.com/apache/opennlp/tree/opennlp-1.8.2
>
>
> To use it in a maven build set the version for opennlp-tools or
> opennlp-uima to 1.8.2 and add the following URL to your settings.xml
> file:
> https://repository.apache.org/content/repositories/orgapacheopennlp-1018
>
> The release was made using the OpenNLP release process, documented on
> the Wiki here:
> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>
> The release contains quite some changes, please refer to the contained
> issue list for details.
>
>
> Please vote on releasing these packages as Apache OpenNLP 1.8.2. The vote
> is
> open for at least the next 72 hours.
>
>
> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
> release candidate and voice their approval or disapproval. The vote passes
> if at least three binding +1 votes are cast.
>
>
> [ ] +1 Release the packages as Apache OpenNLP 1.8.2
> [ ] -1 Do not release the packages because...
>
>
> Thanks!
>
> Jörn
>
> P.S. Here is my +1.
>


Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate

2017-09-05 Thread Tommaso Teofili
+1

Tommaso

Il giorno mar 5 set 2017 alle ore 05:08 Suneel Marthi 
ha scritto:

> +1 binding
>
> On Mon, Sep 4, 2017 at 5:41 PM, Joern Kottmann  wrote:
>
> > Hi Folks,
> >
> >
> > I have posted a first release candidate for the Apache OpenNLP 1.8.2
> > release and it is ready for testing.
> >
> >
> > The RC 1 distributables can be downloaded from here:
> > https://repository.apache.org/content/repositories/
> > orgapacheopennlp-1017/org/apache/opennlp/opennlp-distr/1.8.2/
> >
> >
> > The release was made from the Apache OpenNLP 1.8.2 tag at
> > https://github.com/apache/opennlp/tree/opennlp-1.8.2
> >
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.8.2 and add the following URL to your settings.xml
> > file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1017
> >
> > The release was made using the OpenNLP release process, documented on
> > the Wiki here:
> > https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
> >
> > The release contains quite some changes, please refer to the contained
> > issue list for details.
> >
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.8.2. The vote
> > is
> > open for at least the next 72 hours.
> >
> >
> > Only votes from OpenNLP PMC are binding, but folks are welcome to check
> the
> > release candidate and voice their approval or disapproval. The vote
> passes
> > if at least three binding +1 votes are cast.
> >
> >
> > [ ] +1 Release the packages as Apache OpenNLP 1.8.2
> > [ ] -1 Do not release the packages because...
> >
> >
> > Thanks!
> >
> > Jörn
> >
> > P.S. Here is my +1.
> >
>


Re: Release of TREC Dynamic Domain: Polar Dataset

2017-08-15 Thread Tommaso Teofili
cool, thanks Chris for sharing.

Regards,
Tommaso

Il giorno mer 9 ago 2017 alle ore 18:56 Mattmann, Chris A (3010) <
chris.a.mattm...@jpl.nasa.gov> ha scritto:

> Hi,
>
> We have released our dataset collected from 2015-16 in the Polar Domain,
> called
> the TREC Dynamic Domain Polar dataset.
>
> Researchers interested in a rich dataset collected across the Scientific
> and Deep web
> can use mine HTML pages, PDF files, images, video, audio, and other
> formats for
> scientific insights.
>
> The data is described here:
>
> https://github.com/chrismattmann/trec-dd-polar
>
> And available from the NSF Arctic Data Center here:
>
> https://arcticdata.io/catalog/#view/doi:10.18739/A2280J
>
> If you use the dataset in your work, please consider citing it:
>
> @inproceedings{burgess2015trec,
>   title={TREC Dynamic Domain: Polar Science.},
>   author={Burgess, Annie Bryant and Mattmann, Chris and Totaro, Giuseppe
> and McGibbney, Lewis John and Ramirez, Paul M},
>   booktitle={TREC},
>   year={2015}
> }
>
> (our TREC paper, and/or the DOI from the actual dataset).
>
> Enjoy!
>
> Cheers,
> Chris Mattmann
>
>
>
> ++
> Chris Mattmann, Ph.D.
> Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 180-503E, Mailstop: 180-503
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
>
>
>


Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 3

2017-07-07 Thread Tommaso Teofili
+1 sigs and build ok, langdetect ok.

Regards,
Tommaso

Il giorno ven 7 lug 2017 alle ore 15:55 William Colen  ha
scritto:

> +1 - Tested with multiple other projects. Tested language detector.
>
> 2017-07-07 10:52 GMT-03:00 Joern Kottmann :
>
> > +1 i did run the eval the tests and they passed
> >
> > Jörn
> >
> > On Fri, Jul 7, 2017 at 1:06 PM, Bruno P. Kinoshita
> >  wrote:
> > > Build passing OK with the following environment:
> > > Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> > 2015-11-11T05:41:47+13:00)
> > > Maven home: /opt/maven
> > > Java version: 1.8.0_131, vendor: Oracle Corporation
> > > Java home: /usr/lib/jvm/java-8-oracle/jre
> > > Default locale: en_US, platform encoding: UTF-8
> > > OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family:
> > "unix"
> > >
> > > Had a look at simple reports (findbugs, pmd), all looking good.
> > > [ X ] +1 Release the packages as Apache OpenNLP 1.8.1
> > >
> > > ThanksBruno
> > > 
> > > On Thursday, 6 July 2017, 1:21:32 AM NZST, Suneel Marthi <
> > smar...@apache.org> wrote:
> > >
> > >
> > > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> > 1.8.1
> > > Release Candidate 3.
> > >
> > > The Release artifacts can be downloaded from:
> > >
> > > https://repository.apache.org/content/repositories/
> > orgapacheopennlp-1016/org/apache/opennlp/opennlp-distr/1.8.1/
> > >
> > > The release was made from the Apache OpenNLP 1.8.1 tag at
> > >
> > > https://github.com/apache/opennlp/tree/opennlp-1.8.1
> > >
> > > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima
> > > to 1.8.1
> > >
> > > and add the following URL to your settings.xml file:
> > >
> > > https://repository.apache.org/content/repositories/
> > orgapacheopennlp-1016/
> > >
> > > The artifacts have been signed with the Key - D3541808 found at
> > >
> > > http://people.apache.org/keys/group/opennlp.asc
> > >
> > > Please vote on releasing these packages as Apache OpenNLP 1.8.1. The
> > vote is
> > >
> > > open for the next 72 hours *ending on Saturday, July 8AM EST *.
> > >
> > > Only votes from OpenNLP PMC are binding, but folks are welcome to check
> > the
> > >
> > > release candidate and voice their approval or disapproval. The vote
> > passes
> > >
> > > if at least three binding +1 votes are cast.
> > >
> > > [ ] +1 Release the packages as Apache OpenNLP 1.8.1
> > >
> > > [ ] -1 Do not release the packages because...
> > >
> > > Thanks again to all the committers and contributors for their work
> > > over the past
> > > few weeks.
> >
>


Re: Document Categorizer based on Glove + LSTM (powered by DL4J)

2017-07-05 Thread Tommaso Teofili
thanks Thamme for bringing this to the list!


Il giorno mer 5 lug 2017 alle ore 03:49 Thamme Gowda  ha
scritto:

> Hello OpenNLP Devs,
>
> I am working with text classification using word embeddings like
> Gloves/Word2Vec and LSTM networks.
> It will be interesting to see if we can use it as document categorizer,
> especially for sentiment analysis in OpenNLP.
>
> I have already raised a PR to the sandbox repo -
> https://github.com/apache/opennlp-sandbox/pull/3
>
> This is first version, and I expect to receive feedback from Dev community
> to make it work for everyone.
>
> Here are the design choices I have made for the initial version:
>
>- Using pre-trained Gloves - I felt the glove vector format is clean,
>easily customizable in terms of dimensions and vocabulary size, and
> (also I
>have been reading a lot about them from Stanford NLP group).
>   - Training Gloves isnt hard either, we can do it using the original C
>   library as well as by using DL4J.
>   - Using DL4J's Multi layer networks with LSTM instead of reinventing
>this stuff again on JVM for OpenNLP
>
>
> Please share your feedback here or on the github page
> https://github.com/apache/opennlp-sandbox/pull/3 .
>
>
I think the approach outlined here sounds good, I think we could
incorporate the PR as soon as it implements the Doccat API.
Then we may see whether and how it makes sense to adjust it to use other
types of embeddings (e.g. paragraph vectors) and / or different network
setups (e.g. more hidden layers, bidirectionalLSTM, etc.).

Looking forward to see this move forward,
Regards,
Tommaso


>
> Thanks,
> TG
>
>
> --
> *Thamme Gowda *
> @thammegowda  |
> http://scf.usc.edu/~tnarayan/
> ~Sent via somebody's Webmail server
>


Re: Public datasets for Semantic Relationship Extraction

2017-06-29 Thread Tommaso Teofili
sure, sounds good to me.
Best is to open separate issues for each of the tasks.

Regards,
Tommaso

Il giorno gio 29 giu 2017 alle ore 16:01 Chris Mattmann 
ha scritto:

> Hey Tommaso I was thinking both…but mainly use the datasets for specific
> tasks since
> they seem to be open. Labeled data is hard to come by (
>
> Thoughts?
>
> Cheers,
> Chris
>
>
>
> On 6/28/17, 11:43 PM, "Tommaso Teofili"  wrote:
>
> Hi Chris,
>
> what do you mean specifically ? Leverage some of the works mentioned
> in the
> papers or leverage datasets for specific tasks ? Or both ?
> IIRC there was an OpenNLP page mentioning its usage for the bio NLP
> task,
> not sure about the others.
>
> Regards,
> Tommaso
>
>
> Il giorno mer 28 giu 2017 alle ore 20:58 Chris Mattmann <
> mattm...@apache.org>
> ha scritto:
>
> > Ahh here it is, sorry about that:
> >
> >
> https://github.com/davidsbatista/Annotated-Semantic-Relationships-Datasets
> >
> >
> >
> > On 6/28/17, 11:52 AM, "Suneel Marthi"  wrote:
> >
> > Forced me to join that group first - so will patiently wait for
> the
> > group
> > moderator to consider/rule out my application to join that group
> and
> > then
> > maybe I get to read that post. 🙏
> >
> >
> >
> > On Wed, Jun 28, 2017 at 2:44 PM, Chris Mattmann <
> mattm...@apache.org>
> > wrote:
> >
> > > Hi Team,
> > >
> > > Anything here that we can use in OpenNLP?
> > >
> > > https://www.linkedin.com/groups/131222/131222-
> > > 6284423593917063169?midToken=AQGRDKND99GRHQ&trk=eml-b2_
> > > anet_digest_of_digests-hero-11-discussion~subject&
> > > trkEmail=eml-b2_anet_digest_of_digests-hero-11-discussion~
> > > subject-null-uh2g~j4hb54j7~h2-null-communities~group~
> > >
> discussion&lipi=urn%3Ali%3Apage%3Aemail_b2_anet_digest_of_digests%
> > > 3BnYRsTix4QoG8YsVuU%2FryIg%3D%3D
> > >
> > > CC’ing dev@tika too.
> > >
> > > Cheers,
> > > Chris
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>


Re: Public datasets for Semantic Relationship Extraction

2017-06-28 Thread Tommaso Teofili
Hi Chris,

what do you mean specifically ? Leverage some of the works mentioned in the
papers or leverage datasets for specific tasks ? Or both ?
IIRC there was an OpenNLP page mentioning its usage for the bio NLP task,
not sure about the others.

Regards,
Tommaso


Il giorno mer 28 giu 2017 alle ore 20:58 Chris Mattmann 
ha scritto:

> Ahh here it is, sorry about that:
>
> https://github.com/davidsbatista/Annotated-Semantic-Relationships-Datasets
>
>
>
> On 6/28/17, 11:52 AM, "Suneel Marthi"  wrote:
>
> Forced me to join that group first - so will patiently wait for the
> group
> moderator to consider/rule out my application to join that group and
> then
> maybe I get to read that post. 🙏
>
>
>
> On Wed, Jun 28, 2017 at 2:44 PM, Chris Mattmann 
> wrote:
>
> > Hi Team,
> >
> > Anything here that we can use in OpenNLP?
> >
> > https://www.linkedin.com/groups/131222/131222-
> > 6284423593917063169?midToken=AQGRDKND99GRHQ&trk=eml-b2_
> > anet_digest_of_digests-hero-11-discussion~subject&
> > trkEmail=eml-b2_anet_digest_of_digests-hero-11-discussion~
> > subject-null-uh2g~j4hb54j7~h2-null-communities~group~
> > discussion&lipi=urn%3Ali%3Apage%3Aemail_b2_anet_digest_of_digests%
> > 3BnYRsTix4QoG8YsVuU%2FryIg%3D%3D
> >
> > CC’ing dev@tika too.
> >
> > Cheers,
> > Chris
> >
> >
> >
> >
> >
>
>
>
>


Re: [VOTE] Migrate our main repositories to GitHub

2017-06-28 Thread Tommaso Teofili
+1 to migrate to gitbox [1]

Regards,
Tommaso

[1] : https://gitbox.apache.org/

Il giorno mar 27 giu 2017 alle ore 21:54 Oleg Tikhonov  ha
scritto:

> [x] +1 Migrate all repositories to GitHub
>
>
>
> On Tue, Jun 27, 2017 at 10:48 PM, Chris Mattmann 
> wrote:
>
> > If you are talking about using Apache Gitbox, then yes I am +1 for this.
> >
> > Thanks,
> > Chris
> >
> >
> >
> >
> > On 6/27/17, 3:30 AM, "Joern Kottmann"  wrote:
> >
> > Hello all,
> >
> > lets decide here if we want to move our main repository, currently
> > hosted at Apache to GitHub instead. This will make our process a bit
> > easier because we can eliminate one remote from our workflow.
> >
> >  [ ] +1 Migrate all repositories to GitHub
> >  [ ] -1 Do not migrate,  because...
> >
> > Thanks,
> > Jörn
> >
> >
> >
> >
>


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-18 Thread Tommaso Teofili
+1 (binding)

Regards,
Tommaso

p.s.:

+1 also to Bruno's side comments

Il giorno gio 18 mag 2017 alle ore 12:43 Bruno P. Kinoshita
 ha scritto:

>
> [ X ] +1 Release the packages as Apache OpenNLP 1.8.0
>
> Not binding
>
> Side note: would be nice later to start fixing some issues found via
> FindBugs. Running `mvn clean findbugs:findbugs findbugs:gui` shows several
> errors, some seem important, like using equals() for array objects (which
> will always be false).
>
> See
>
>
> https://github.com/apache/opennlp/blob/73c8e5b9d8e055fefb53f7f3c2487d05c9788c6a/opennlp-tools/src/main/java/opennlp/tools/util/TokenTag.java#L85
>
> And
>
>
>
> https://github.com/apache/opennlp/blob/73c8e5b9d8e055fefb53f7f3c2487d05c9788c6a/opennlp-tools/src/main/java/opennlp/tools/util/featuregen/POSTaggerNameFeatureGenerator.java#L59
> Plus other NullPointerException's that can be prevented, and other minor
> issues. Not blockers for the release though, IMO.
>
> Cheers
> Bruno
>
>
> 
> From: Joern Kottmann 
> To: dev@opennlp.apache.org
> Sent: Thursday, 18 May 2017 9:49 AM
> Subject: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3
>
>
>
> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
>
> 1.8.0 Release Candidate 3.
>
>
> The RC 3 distributables can be downloaded from here:
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-101
>
> 3/org/apache/opennlp/opennlp-distr/1.8.0/
>
>
> The release was made from the Apache OpenNLP 1.8.0 tag at
>
> https://github.com/apache/opennlp/tree/opennlp-1.8.0
>
>
>
> To use it in a maven build set the version for opennlp-tools or
>
> opennlp-uima to 1.8.0 and add the following URL to your settings.xml
>
> file:
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-101
>
> 3
>
>
>
> The release was made using the OpenNLP release process, documented on
>
> the Wiki here:
>
> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>
>
>
> The release contains quite some changes, please refer to the contained
>
> issue list for details.
>
>
>
> Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
>
> vote is open for at least the next 72 hours.
>
>
>
> Only votes from OpenNLP PMC are binding, but folks are welcome to check
>
> the release candidate and voice their approval or disapproval. The vote
>
> passes if at least three binding +1 votes are cast.
>
>
>
> [ ] +1 Release the packages as Apache OpenNLP 1.8.0
>
> [ ] -1 Do not release the packages because...
>
>
>
>
>
> Thanks!
>
>
> Jörn
>
>
> P.S. Here is my +1.
>


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-12 Thread Tommaso Teofili
+1 (binding)

- source distr build succeeds
- build from tag succeeds
- signatures and hashes ok

Regards,
Tommaso

Il giorno ven 12 mag 2017 alle ore 01:11 Suneel Marthi 
ha scritto:

> +1 binding
>
> 1. Downloaded artifacts and ran thru a clean build - all unit tests pass
> 2. verified sigs and hashes
>
> On Thu, May 11, 2017 at 9:37 AM, Joern Kottmann 
> wrote:
>
> > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> > 1.8.0 Release Candidate 2.
> >
> > The RC 2 distributables can be downloaded from here:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-101
> > 2/org/apache/opennlp/opennlp-distr/1.8.0/
> >
> > The release was made from the Apache OpenNLP 1.8.0 tag at
> > https://github.com/apache/opennlp/tree/opennlp-1.8.0
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.8.0 and add the following URL to your settings.xml
> > file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-101
> > 2
> >
> > The release was made using the OpenNLP release process, documented on
> > the Wiki here:
> > https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
> >
> > The release contains quite some changes, please refer to the contained
> > issue list for details.
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
> > vote is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but folks are welcome to check
> > the release candidate and voice their approval or disapproval. The vote
> > passes if at least three binding +1 votes are cast.
> >
> > [ ] +1 Release the packages as Apache OpenNLP 1.8.0
> > [ ] -1 Do not release the packages because...
> >
> >
> > Thanks!
> >
> > Jörn
> >
> > P.S. Here is my +1.
> >
>


Re: [VOTE] Apache OpenNLP 1.7.1 Release Candidate 1

2017-01-22 Thread Tommaso Teofili
+1

- checked sigs
- build ok
- license ok

Regards,
Tommaso


Il giorno dom 22 gen 2017 alle ore 18:18 Joern Kottmann 
ha scritto:

> On Sat, 2017-01-21 at 21:09 -0500, Jeffrey Zemerick wrote:
> > I went to the opennlp-distr/README for a summary of changes in 1.7.1
> > but I
> > think it is the same as it was for 1.7.0. Is that file typically
> > updated
> > for revision releases? The link at the bottom of the RELEASE_NOTES to
> > the
> > fixed JIRA issues is issuesFixed/jira-report.html. Minor stuff but
> > thought
> > I'd ask.
> >
> >
>
> Yes, this file should be updated. And usually we do this, just this
> time we didn't, I think we should release anyway and if we have to do
> RC 2 we can update it.
>
> There is also another minor thing with a test for maxent qn which is
> not configured correctly.
>
> Anyway, beside that, which will be perfect in 1.7.2 I didn't know of
> anything which would keep us from taking RC 1 for the 1.7.1 release, I
> will have a more detailed look at it now.
>
> Jörn
>


Re: Commit message style

2017-01-10 Thread Tommaso Teofili
+1

Tommaso

Il giorno mar 10 gen 2017 alle ore 11:20 Rodrigo Agerri 
ha scritto:

> +1 for the OPENNLP-xxx: commit message.
>
>
>
> On Tue, Jan 10, 2017 at 12:51 AM, William Colen 
> wrote:
>
> > +1 for the OPENNLP-xxx: commit message.
> > Fast to find a commit.
> >
> >
> > 2017-01-09 21:24 GMT-02:00 Joern Kottmann :
> >
> > > On Mon, 2017-01-09 at 17:02 -0500 <02%200500>, Jeffrey Zemerick wrote:
> > > > I'm personally a fan of the issue number being the first thing on the
> > > > subject line, like "OPENNLP-xxx: commit message." For me it gives a
> > > > consistent place to look for the issue without having to read the
> > > > full
> > > > message. (That way you can also see the issue number in GitHub's
> > > > commit
> > > > list without having to expand the commit.)
> > >
> > >
> > > Yes, it is also faster to write like that, on the other hand if the
> > > subject line is then too short to write something meaningful it is
> > > probably better to write it in the body instead.
> > >
> > > +1 to write it first thing in the subject line in all cases where it is
> > > possible, for very rare cases where it doesn't work it can still be in
> > > the body
> > >
> > > Jörn
> > >
> >
>


Re: [ANNOUNCE] Welcome our new committer Suneel Marthi

2017-01-03 Thread Tommaso Teofili
welcome Suneel!

Regards,
Tommaso

Il giorno mar 3 gen 2017 alle ore 19:26 Joern Kottmann 
ha scritto:

> Hi all,
>
> The Apache OpenNLP PPMC is very pleased to announce that
> Suneel Marthi accepted our invitation to become an Apache OpenNLP
> committer.
>
> Suneel helped us with many PRs to get the 1.7.0 release out and had
> lots of advice on how to increase the development activity again.
>
> Congratulations, and welcome in the team!
>
> Jörn
>


Re: Next release

2017-01-01 Thread Tommaso Teofili
I would like to have OPENNLP-890 in for 1.7.1, I can work on it in a couple
of weeks.

Other than that I second more frequent releases, depending on dev pace
perhaps we could plan for a release every X months (3 sounds best to me,
but probably 6 is more adeguate for our dev effort).

Regards,
Tommaso

Il giorno dom 1 gen 2017 alle ore 20:13 Joern Kottmann 
ha scritto:

> Hello all,
>
> now all the tests we do to release OpenNLP are automated and that
> allows us to also do more frequent releases.
>
> I would like to do a couple of releases this year and not just one,
> so the next one will probably be 1.7.1 and we should do it rather soon.
>
> There is a PR we can merge for it, and I will also have time to work on
> a couple of jira issues.
>
> If you have something you would like to get in please start working on
> it now and get the changes merged into trunk.
>
> Any opinions?
>
> Jörn
>


Re: OpenNLP 1.7.0 RC 2 is ready for testing

2017-01-01 Thread Tommaso Teofili
+1

Source build ok
Sigs ok
License & co ok
Il giorno dom 1 gen 2017 alle 03:02 Richard Eckart de Castilho <
r...@apache.org> ha scritto:

> On 01.01.2017, at 02:41, Suneel Marthi  wrote:
> >
> > The release has been finalized - please find the 1.7.0 release artifacts
> at
> > http://www.apache.org/dist/opennlp/opennlp-1.7.0/
>
> Hm, I only saw two binding votes instead of the usual three ones [1].
>
>   Jörn: +1
>   William: +1
>   Suneel: +1 (non-binding)
>
> Did I miss a vote?
>
> I also checked the mailing list archive for additional votes [2].
>
> Cheers,
>
> -- Richard
>
> [1] http://apache.org/foundation/voting.html
> [2]
> http://mail-archives.apache.org/mod_mbox/opennlp-dev/201612.mbox/thread


Re: Update to Java 8

2016-12-19 Thread Tommaso Teofili
+1

Tommaso

Il giorno lun 19 dic 2016 alle ore 22:27 ARUN Thundyill Saseendran <
ats0...@gmail.com> ha scritto:

> +1 to move to 1.8
>
> On Tue, Dec 20, 2016 at 2:51 AM, Suneel Marthi <
> suneel_mar...@yahoo.com.invalid> wrote:
>
> > +1 to move to Java 8
> >
> >
> >   From: Joern Kottmann 
> >  To: "dev@opennlp.apache.org" 
> >  Sent: Monday, December 19, 2016 8:45 AM
> >  Subject: Update to Java 8
> >
> > Hello all,
> >
> > Java 7 is already EOL.
> >
> > Should we update OpenNLP to Java 8 for the 1.7.0 release, any opinions?
> >
> > Jörn
> >
> >
>
>
>
>
> --
>


Re: Next release

2016-11-07 Thread Tommaso Teofili
+1 for 1.7
Il giorno lun 7 nov 2016 alle 19:27 William Colen 
ha scritto:

> +1 for 1.7
>
> William
>
> 2016-11-07 16:22 GMT-02:00 Joern Kottmann :
>
> > I am also in favor of 1.7.
> >
> > Jörn
> >
> > On Mon, 2016-11-07 at 18:01 +, Russ, Daniel (NIH/CIT) [E] wrote:
> > > Also the lemmatizer has significantly changed.  I vote 1.7
> > >
> > > On 11/7/16, 12:59 PM, "Joern Kottmann"  wrote:
> > >
> > > Hello all,
> > >
> > > since our last release it has been a while and we received quite
> > > a few
> > > changes which would be nice to get released.
> > >
> > > There are still some open Jira issues, but mostly smaller things
> > > that
> > > can be wrapped up rather quickly.
> > >
> > > Is there anything important missing which should go into the next
> > > release? Otherwise I think we should also aim for more frequent
> > > released and just make one again early next year, with all the
> > > stuff we
> > > might miss out now.
> > >
> > > We took in a patch - as part of OPENNLP-830 - to replace our
> > > self-made
> > > hash table with the java.util.HashMap. This change is not
> > > backward
> > > compatible for folks who extend AbstractModel.
> > >
> > > Should we go with 1.6.1 as a next version or should we make 1.7.0
> > > to
> > > reflect that?
> > >
> > > Previously we only had backward incompatible changes in versions
> > > which
> > > bumped by the second number. Maybe that is better choice. It will
> > > probably break some peoples code when they update.
> > >
> > > We also have lots of deprecated API still in OpenNLP, should we
> > > try to
> > > remove as much as possible of it now?
> > >
> > > Jörn
> > >
> > >
> >
>


OpenNLP Github Mirror out of sync

2016-10-03 Thread Tommaso Teofili
Hi Infra,

it seems the OpenNLP Github Mirror [1] is currently out of sync, we
(committers) had pushed some commits in September but the history there
still points to May 24th, could you please check if that can be sync'ed ?

Thanks and regards,
Tommaso

[1] : https://github.com/apache/opennlp


Re: Access to Git

2016-09-30 Thread Tommaso Teofili
when did you push them ? Another project I'm involved in had the very same
problem, after contacting infra@ and doing a trivial commit the mirror
sync'ed again.

Regards,
Tommaso

Il giorno ven 30 set 2016 alle ore 13:02 Rodrigo Agerri 
ha scritto:

> Hello,
>
> I have committed and push some stuff using the git repo, but it
> appears not in the github mirror
>
> https://github.com/apache/opennlp
>
> or in the svn repo
>
> http://svn.apache.org/viewvc/opennlp/trunk/
>
> it does however appear in the original git repo
>
> https://git-wip-us.apache.org/repos/asf?p=opennlp.git;a=summary
>
> Is this intentional?
>
> Cheers,
>
> Rodrigo
>
> On Mon, Sep 19, 2016 at 11:50 PM, Joern Kottmann 
> wrote:
> > The opennlp-addons repo is now also available, and opennlp-sandbox will
> > be available soon.
> >
> > Jörn
> >
> >
> > On Thu, 2016-09-15 at 01:12 +0200, Joern Kottmann wrote:
> >> Sorry, it took me a little to figure this out.
> >>
> >> This link explains how it works:
> >> https://reference.apache.org/committer/git
> >>
> >> > The reponame is opennlp, we will soon also have the other repos
> > opennlp-addons and opennlp-sandbox.
> >>
> >> Jörn
> >>
> >> > > On Fri, Sep 9, 2016 at 10:58 PM, Joern Kottmann  >
> > wrote:
> >> > > > Hello, yes you can use it. The add-ons and other things are not
> > setup yet as far as I know, have to ping the infra team about it.
> >> > Please have a look at the issue I posted to see how to access it.
> >> > I will work on this on Monday.
> >> > HTH
> >> >
> >> > Jörn
> >> >
> >> > > > > > On Sep 9, 2016 19:10, "William Colen" <
> william.co...@gmail.com>
> > wrote:
> >> > > Hello,
> >> > >
> >> > >
> >> > > Is the Git repository ready for use?
> >> > >
> >> > > Do we need to wait for it to develop new stuff?
> >> > >
> >> > >
> >> > >
> >> > > Thank you,
> >> > >
> >> > > William
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
>


Re: Fw: ApacheCon Europe 2016: Talk accepted!

2016-09-28 Thread Tommaso Teofili
Very interesting !
Thanks for letting us know.

Tommaso
Il giorno mer 28 set 2016 alle 17:05 Boris Galitsky 
ha scritto:

>
> Hello
>
>  Just wanted to share I will be talking on how to do deep text analysis
> including discourse trees on top of OpenNLP
>
> Regards
> Boris
>
> ---
>
>
>
>
> Hi Boris Galitsky,
> We are pleased to tell you that your talk, "A Deep Text Analysis System
> Based on OpenNLP", has been accepted
> for ApacheCon Europe 2016.
>
> Please confirm that you are still able/willing to speak at this event.
>
> With regards,
> The team behind ApacheCon Europe 2016
> rbo...@apache.org
>


Re: Migrate to Git?

2016-07-04 Thread Tommaso Teofili
+1

Il giorno lun 4 lug 2016 alle ore 16:41 Madhawa Kasun Gunasekara <
madhaw...@gmail.com> ha scritto:

> +1
>
> Madhawa
>
> On Mon, Jul 4, 2016 at 8:09 PM, Anthony Beylerian <
> anthony.beyler...@gmail.com> wrote:
>
> > +1
> >
> > On Mon, Jul 4, 2016 at 11:36 PM, Joern Kottmann 
> > wrote:
> >
> > > Hello all,
> > >
> > > do we still want to do this? Has been a while since we discussed it.
> > > I am happy to get it done if we reach consensus on it again.
> > >
> > > My +1 again.
> > >
> > > Jörn
> > >
> > > On Thu, Dec 20, 2012 at 4:40 PM, Tommaso Teofili <
> > > tommaso.teof...@gmail.com>
> > > wrote:
> > >
> > > > in my opinion that would be good, +1
> > > > Tommaso
> > > >
> > > >
> > > > 2012/12/19 Jörn Kottmann 
> > > >
> > > > > Hi all,
> > > > >
> > > > > I heard at ApacheCon Europe that it should be possible to migrate
> > from
> > > > > Subverion to Git.
> > > > >
> > > > > Is there any interest in doing that? If we decide to do it I
> suggest
> > to
> > > > > wait until the
> > > > > 1.5.3 release is done so we have a bit time to also migrate our
> build
> > > > > process.
> > > > >
> > > > > Do have all committers experience with git?
> > > > >
> > > > > Jörn
> > > > >
> > > >
> > >
> >
>


Re: DeepLearning4J as a ML for OpenNLP

2016-06-28 Thread Tommaso Teofili
I had briefly looked into it a while ago, would be nice to collaborate
there.

Tommaso


Il giorno mar 28 giu 2016 alle 23:26 Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> ha scritto:

> Yep I think so - you may also look at SciSpark
> http://scispark.jpl.nasa.gov
> where we are using DL4J/ND4J and Breeze interchangeably here.
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
>
>
>
>
>
>
>
>
>
>
> On 6/28/16, 2:23 PM, "William Colen"  wrote:
>
> >Hi,
> >
> >Do you think it would be possible to implement a ML based on DL4J?
> >
> >http://deeplearning4j.org/
> >
> >Thank you
> >William
>


Re: Profiler for OpenNLP

2016-06-07 Thread Tommaso Teofili
+1 that sounds quite interesting.

Regards,
Tommaso

Il giorno mar 7 giu 2016 alle ore 20:03 Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> ha scritto:

> We would love to have this part of Apache Tika. You can take a look
> at the existing NER/NLP stuff integrated like in GeoTopicParser as
> an example and yes please file a JIRA issue:
>
> http://issues.apache.org/jira/browse/TIKA
>
> I would be happy to work with you to make it happen.
>
> See: http://github.com/apache/tika/#contributing-via-github
>
> For guidance.
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
>
>
>
>
>
>
>
>
>
>
> On 6/7/16, 9:36 AM, "Anthony Beylerian" 
> wrote:
>
> >Hello,
> >
> >We are currently working on an experimental author profiler that we think
> >could be added to the toolkit.
> >
> >The profiler aims to detect the gender and age range of an author.
> >Later we hope to add personality aspects such as:
> >[extroverted, stable, agreeable, conscientious]
> >
> >We would like the teams' opinion on the matter.
> >An initial code drop can be found here[1] if someone is willing to
> >contribute/collaborate on it with us please let us know.
> >
> >Thanks!
> >
> >[1] https://github.com/beylerian/profiler
>


Re: OPENNLP-837

2016-03-11 Thread Tommaso Teofili
Hi Jeffrey,

thanks for your contribution.
I'll have a look at it and comment on the issue.

Regards,
Tommaso


Il giorno ven 11 mar 2016 alle ore 14:29 Jeffrey Zemerick <
jzemer...@apache.org> ha scritto:

> Hi all,
>
> I attached a patch to OPENNLP-837 (
> https://issues.apache.org/jira/browse/OPENNLP-837). With the patch, if the
> number of unique training events is zero a new exception
> (InsufficientTrainingDataException) is thrown. The model creation halts on
> the exception. As before, if you have any comments or want anything done
> differently please let me know!
>
> Thanks,
> Jeff
>


Re: Language Model contribution

2016-02-18 Thread Tommaso Teofili
Hi Jörn,

good you're ok with the LanguageModel API; currently the only existing
implementation is the NGramLanguageModel.
In order to create such a model you add ngrams to it as in NGramModel:

> LanguageModel languageModel = new NGramLanguageModel(*3*); // trigram
language model
> languageModel.add(new StringList(tokens), 1, *3*); // uni/bi/tri-grams
for tokenized text (StringList)

Once done with adding ngrams you can compute probability of a e.g. a
tokenized sentence with:

> double p = languageModel.calculateProbability(new StringList("neural",
"network", "language"));

Internally then it uses Laplace smoothing [1] for computing probabilities
if |ngrams| < 1M, otherwise it uses Stupid Backoff [2].
You can also use the LM to predict the next ngram given a sequence of
tokens (but that iterates over all the ngrams in order to find the most
probable and could be slow).

> StringList tokens = languageModel.predictNextTokens(new StringList(
"neural", "network", "language"));
> assertEquals(new StringList("models"), tokens);

One can quickly have a look at its usage by looking at the
NgramLanguageModelTest#testTrigramLanguageModelCreationFromText [3].

Hope this helps and of course if there're any additional questions, feel
free to ask.
Regards,
Tommaso

[1] : https://en.wikipedia.org/wiki/Additive_smoothing
[2] : http://www.aclweb.org/anthology/D07-1090.pdf
[3] :
https://github.com/apache/opennlp/blob/trunk/opennlp-tools/src/test/java/opennlp/tools/languagemodel/NgramLanguageModelTest.java#L131

Il giorno mer 17 feb 2016 alle ore 19:39 Joern Kottmann 
ha scritto:

> Ups, confused the language model you were working on with language
> detection.
> I think the interface is good as it is.
>
> Jörn
>
> On Wed, Feb 17, 2016 at 10:00 AM, Joern Kottmann 
> wrote:
>
> > Hello,
> >
> > I saw the language model commit. Thanks for contributing that!
> >
> > Would it be possible to get a short introduction to it?
> >
> > The interface is supposed to take a StringList. Wouldn't it be better if
> a
> > user can just pass in a String instead? Otherwise he has to worry about
> > tokenizing a string in a language he doesn't know. I think that should be
> > the task of the language detector.
> >
> > Can we come up with another name for the package? Maybe langid/langdetect
> > or something similar? Any opinions?
> >
> > The Model in LanguageModel we usually use to refer to machine learning
> > models, maybe we could rename this interface to LanguageDetector.
> >
> > Jörn
> >
>


Re: Question about OpenNLP and comparison to e.g., NTLK, Stanford NER, etc.

2015-11-13 Thread Tommaso Teofili
Il giorno gio 12 nov 2015 alle ore 20:50 Jason Baldridge <
jasonbaldri...@gmail.com> ha scritto:

> As one of the people who got OpenNLP started in the late 1990's (for
> research, but hoping it could be used by industry), it makes me smile to
> know that lots of people use it happily to this day. :)
>
> There are lots of new kids in town, but the licensing is often conflicted,
> and the biggest benefits often come---as Joern mentions---by having the
> right data to train your classifier.
>
> Having said that, there is a lot of activity in the deep learning space,
> where old techniques (neural nets) are now viable in ways they weren't
> previously, and they are outperforming linear classifiers in task after
> task. I'm currently looking at Deeplearning4J, and it would be great to
> have OpenNLP or a project like it make solid NLP models available based on
> deep learning methods, especially LSTMs and Convolutional Neural Nets.
> Deeplearning4J is Java/Scala friendly and it is ASL, so that's at least
> setting off on the right foot.
>
> http://deeplearning4j.org/
>
> The ND4J library (based on Numpy) that was built to support DL4J is also
> likely to be useful for other Java projects that use machine learning.
>

+1, thanks Jason, it's indeed an interesting field we should look into.
Another interesting technique based on neural networks is the one related
to word vectors (aka word embeddings) [1].

I agree with Joern it'd be interesting to see if we can provide an
integration with DL4J.

Regards,
Tommaso

[1] : https://en.wikipedia.org/wiki/Word_embedding


>
> -Jason
>
> On Thu, 12 Nov 2015 at 09:44 Russ, Daniel (NIH/CIT) [E] <
> dr...@mail.nih.gov>
> wrote:
>
> > Chris,
> > Joern is correct.  However, If I can slightly disagree on a few minor
> > points.
> >
> > 1) I use the old sourceforge models.  I find that the source of error in
> > my analysis are usually not do to mistakes in sentence detection or POS
> > tagging.  I don’t have the annotated data or the time/money to build
> custom
> > models.  Yes, the text I analyze is quite different than the (WSJ? or
> what
> > corpus was used to build the models), but it is good enough.
> >
> > 2)  MaxEnt is still good classifier for NLP, and L-BFGS is just a
> > algorithm to calculate the weights for the features.  It is an
> improvement
> > on GIS, not a different classifier.  I am not familiar enough with CRF’s
> to
> > comment, but the seminal paper by Della Peitra (IEEE trans part anal and
> > Mach intel, v19 no4 1997) make it appear as an extension of MaxEnt.  The
> > Stanford NLP groups have ppt lectures online explain why discriminative
> > classification methods (e.g. MaxEnt) work better then generative (Naive
> > Bayes) models (see
> > https://web.stanford.edu/class/cs124/lec/Maximum_Entropy_Classifiers.pdf
> ;
> > particularly the example with traffic lights)
> >
> > As I briefly mentioned earlier.  OpenNLP is a mature product.  It has
> > undergone some MAJOR upgrades.  It is not obsolete.  As for the other
> > tools/libraries, they are also fine products.  I use the Stanford parser
> to
> > get dependency information.  OpenNLP just does not do it.  I don’t use
> NTLK
> > because I need to.  If the need arises, I will.  I assume that you don’t
> > have the time and money to learn every new NLP product.  I would say play
> > to your strengths. If you know the package use it. Don’t change because
> > it’s trendy.
> >
> >
> >
> > Daniel Russ, Ph.D.
> > Staff Scientist, Division of Computational Bioscience
> > Center for Information Technology
> > National Institutes of Health
> > U.S. Department of Health and Human Services
> > 12 South Drive
> > Bethesda,  MD 20892-5624
> >
> > On Nov 11, 2015, at 4:41 PM, Joern Kottmann  > kottm...@gmail.com>> wrote:
> >
> > Hello,
> >
> >
> >
> > It is definitely true that OpenNLP exists for a long time (more than 10
> > years), but that doesn't mean it wasn't improved. Actually it changed a
> > lot in that period.
> >
> > The core strength of OpenNLP was always that it can be used really easy
> > to perform one of the supported NLP tasks.
> >
> > This was further improved with the 1.5 release adding model packages
> > that ensure that the components are always instantiated correctly across
> > different runtime environments.
> >
> > The problem is that the system used to perform the training of a model
> > and the system used to run it can be quite different. Prior 1.5 it was
> > possible to get that wrong, which resulted in hard to notice performance
> > problems.
> > I suspect that is an issue many of the competing solutions still have
> > today.
> >
> > An example is the usage of String.toLowerCase(). The output out it
> > depends on the platform local.
> >
> > One of the things that got dated a bit was the machine learning part of
> > OpneNLP, this was addressed by adding more algorithms (e.g. perceptron
> > and L-BFGS maxent). In addition the machine learning part is now
> > plugable and can easily be s

Re: Are there plans to offer a Naive Bayesian Classifier in OpenNLP ?

2015-07-20 Thread Tommaso Teofili
Hi Cohan,

for future reference it'd be better if you could attach the patch to the
Jira issue, I can have a look at your patch and comment later this week.
Usually we mark an issue as fixed as soon as the relevant code is committed.

Regards,
Tommaso


2015-07-18 15:36 GMT+02:00 Cohan Sujay Carlos :

> Yet another patch (ouch, yeah, like they say, "in product development,
> you're never done, till you're done!").
>
> Reason: Error in one of the test-cases.  The test-case
> DocumentCategorizerNBTest is now correct and runs fine.
>
> Cohan Sujay Carlos
> CEO, Aiaioo Labs
> +91-77605-80015
>
> On Sat, Jul 18, 2015 at 6:51 PM, Cohan Sujay Carlos 
> wrote:
>
>> A small update to the patch (I removed a superfluous piece of code).
>>
>> In the earlier path, I had used a subclass of
>> opennlp.tools.doccat.DoccatModel called opennlp.tools.doccat.DoccatModelNB
>> that was functionally identical.  I removed that subclass since it
>> wasn't essential (DoccatModel does the trick just fine).
>>
>> Is there anything else I need to do?
>>
>> Is someone on the dev team going to be responsible for incorporating the
>> patch into the codebase?
>>
>> Can I mark this Jira issue fixed (for target 1.6.1?).
>>
>> Cohan Sujay Carlos
>> CEO, Aiaioo Labs
>> +91-77605-80015
>>
>>
>> On Sat, Jul 18, 2015 at 6:02 PM, Cohan Sujay Carlos 
>> wrote:
>>
>>> I have gone ahead and written the test-cases and verified that the Naive
>>> Bayes Classifier works correctly.
>>>
>>> Here is the latest patch (attached) with the test-cases and everything.
>>>
>>> In implementing the Naive Bayes classifier, we tried to *ensure minimal
>>> disruption* to existing code.
>>>
>>> The *only* changes to existing code are as follows:
>>>
>>> 1. The opennlp.tools.ml.model.AbstractModel class has been changed to
>>> include a new model type:
>>>
>>> line 35: *public enum ModelType *
>>> *{Maxent,Perceptron,MaxentQn,NaiveBayes};*
>>>
>>> 2. The opennlp.tools.ml.model.GenericModelReader class has been changed
>>> in one place:
>>>
>>> line 53:
>>> *else if (modelType.equals("NaiveBayes")) **{ delegateModelReader = new
>>> NaiveBayesModelReader(this.dataReader); }*
>>>
>>> 3. The opennlp.tools.ml.model.GenericModelWriter class has been changed
>>> in two places:
>>>
>>> line 79:
>>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter =
>>> new BinaryNaiveBayesModelWriter(model,dos); }*
>>>
>>> line 91:
>>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter =
>>> new PlainTextNaiveBayesModelWriter(model,bw); }*
>>>
>>> 4. The initializer of the opennlp.tools.ml.TrainerFactory class has
>>> been changed in one place to add the Naive Bayes trainer:
>>>
>>> line 51:
>>> *_trainers.put(NaiveBayesTrainer.NAIVE_BAYES_VALUE,
>>> NaiveBayesTrainer.class);*
>>>
>>> That was it!
>>>
>>> We didn't change anything else in the existing OpenNLP code.
>>>
>>> All the new code for the Naive Bayesian classifier sits in the package
>>> opennlp.tools.ml.naivebayes - just above the perceptron
>>>
>>> The code for the document categorizer using the Naive Bayesian
>>> classifier can be found in opennlp.tools.doccat (we didn't have to
>>> change any existing code). The new doccat is called
>>> opennlp.tools.doccat.DocumentCategorizerNB (reflecting the name of the
>>> maxent document categorizer, which is DocumentCategorizerME).
>>>
>>> Proof of correctness!
>>>
>>> I have included two testcases:
>>>
>>> 1. A test to validate the document categorizer - under the tests folder,
>>> you will find opennlp.tools.doccat.DocumentCategorizerNBTest - which
>>> runs the same tests that were run on the ME document categorizer, but on
>>> the Naive Bayes categorizer instead (all tests passed).
>>>
>>> 2. A test to check the mathematical correctness of the Naive Bayes
>>> implementation can be found in
>>> opennlp.tools.ml.naivebayes.NaiveBayesCorrectnessTest.
>>>
>>> So, the inclusion of this code will minimally impact any existing code.
>>>
>>> And the code in this patch contains a multinomial Naive Bayesian
>>> classifier that is verifiably correct.
>>>
>>> Is there anything else I have to do to have this patch pulled into the
>>> OpenNLP code base (for say 1.7.0)?
>>>
>>> Cohan Sujay Carlos
>>> CEO, Aiaioo Labs
>>> +91-77605-80015
>>>
>>> On Tue, May 19, 2015 at 7:21 PM, Cohan Sujay Carlos 
 wrote:

> Tommaso,
>
> I have created the Jira issue:
> https://issues.apache.org/jira/browse/OPENNLP-777
>
>


Re: Are there plans to offer a Naive Bayesian Classifier in OpenNLP ?

2015-05-19 Thread Tommaso Teofili
Hi Cohan,

I think that'd be a very valuable contribution, as NB is one of the
foundation algorithms, often used as basis for comparisons.
It would be good if you could create a Jira issue and provide more details
about the implementation and, eventually, a patch.

Thanks and regards,
Tommaso

2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos :

> I have a question for the OpenNLP project team.
>
> I was wondering if there is a Naive Bayesian classifier implementation in
> OpenNLP that I've not come across, or if there are plans to implement one.
>
> If it is the latter, I should love to contribute an implementation.
>
> There is an ME classifier already available in OpenNLP, of course, but I
> felt that there was an unmet need for a Naive Bayesian (NB) classifier
> implementation to be offered as well.
>
> An NB classifier could be bootstrapped up with partially labelled training
> data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> Classification from Labeled and Unlabeled Documents using EM".
>
> So, if there isn't an NB code base out there already, I'd be happy to
> contribute a very solid implementation that we've used in production for a
> good 5 years.
>
> I'd have to adapt it to load the same training data format as the ME
> classifier, but I guess that shouldn't be very difficult to do.
>
> I was wondering if there was some interest in adding an NB implementation
> and I'd love to know who could I coordinate with if there is?
>
> Cohan Sujay Carlos
> CEO, Aiaioo Labs, India
> +91-77605-80015 +91-80-4125-0730
>


Re: svn commit: r1670574 - /opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java

2015-04-01 Thread Tommaso Teofili
I think you're right, I'll revert it and solve OPENNLP-764 as not a
problem, sorry for the noise.

Tommaso

2015-04-01 15:05 GMT+02:00 Joern Kottmann :

> The adaptive data is cleared in the documentDone method. The statement in
> the issue that it is not cleared is not true afaik.
>
> Jörn
>
> On Wed, Apr 1, 2015 at 9:47 AM,  wrote:
>
> > Author: tommaso
> > Date: Wed Apr  1 07:47:41 2015
> > New Revision: 1670574
> >
> > URL: http://svn.apache.org/r1670574
> > Log:
> > OPENNLP-764 - applied patch from Pablo Duboue, clearing adaptive data
> > after doc processing
> >
> > Modified:
> >
> >
> opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java
> >
> > Modified:
> >
> opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java
> > URL:
> >
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java?rev=1670574&r1=1670573&r2=1670574&view=diff
> >
> >
> ==
> > ---
> >
> opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java
> > (original)
> > +++
> >
> opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java
> > Wed Apr  1 07:47:41 2015
> > @@ -169,6 +169,8 @@ public final class NameFinder extends Ab
> >documentConfidence.add(prob);
> >  }
> >
> > +mNameFinder.clearAdaptiveData();
> > +
> >  return names;
> >}
> >
> > @@ -210,4 +212,4 @@ public final class NameFinder extends Ab
> >public void destroy() {
> >  mNameFinder = null;
> >}
> > -}
> > \ No newline at end of file
> > +}
> >
> >
> >
>


Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-30 Thread Tommaso Teofili
thanks Jörn I'll follow your suggestion and change the test impl
accordingly.

Regards,
Tommaso

2015-01-29 13:16 GMT+01:00 Joern Kottmann :

> In those serialization tests I usually write the Object into a byte
> buffer, create it again from the byte buffer and then compare the two
> objects, instead of the binary representation.
>
> Could that solve the problem we have in this test?
>
> Jörn
>
> On Thu, 2015-01-29 at 12:11 +0100, Tommaso Teofili wrote:
> > I've just disabled that test, I'll fix it and re-enable it when done.
> >
> > Regards,
> > Tommaso
> >
> > 2015-01-29 10:51 GMT+01:00 Joern Kottmann :
> >
> > > It still fails in the assert. I didn't check but I guess the build
> > > server has the same problem.
> > >
> > > Jörn
> > >
> > > On Thu, 2015-01-29 at 10:25 +0100, Tommaso Teofili wrote:
> > > > even after my latest commit? If so I'll rearrange the test a bit.
> > > >
> > > > Tommaso
> > > >
> > > > 2015-01-29 10:21 GMT+01:00 Joern Kottmann :
> > > >
> > > > > Or if that is a problem for the test, you could also tell RAT to
> ignore
> > > > > it.
> > > > >
> > > > > On my machine the test fails. The two strings don't match.
> > > > >
> > > > > Jörn
> > > > >
> > > > > On Thu, 2015-01-29 at 09:59 +0100, Tommaso Teofili wrote:
> > > > > > right, thanks I'll fix both.
> > > > > >
> > > > > > Tommaso
> > > > > >
> > > > > > 2015-01-29 9:54 GMT+01:00 Joern Kottmann :
> > > > > >
> > > > > > > This file should have an AL header.
> > > > > > >
> > > > > > > Jörn
> > > > > > >
> > > > > > > On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote:
> > > > > > > > Added:
> > > > > > > >
> > > > > > >
> > > > >
> > >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> > > > > > > > URL:
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml?rev=1655546&view=auto
> > > > > > > >
> > > > > > >
> > > > >
> > >
> ==
> > > > > > > > ---
> > > > > > > >
> > > > > > >
> > > > >
> > >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> > > > > > > (added)
> > > > > > > > +++
> > > > > > > >
> > > > > > >
> > > > >
> > >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> > > > > > > Thu Jan 29 08:02:31 2015
> > > > > > > > @@ -0,0 +1,58 @@
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +brown
> > > > > > > > +fox
> > > > > > > > +
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
>
>
>


Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Tommaso Teofili
I've just disabled that test, I'll fix it and re-enable it when done.

Regards,
Tommaso

2015-01-29 10:51 GMT+01:00 Joern Kottmann :

> It still fails in the assert. I didn't check but I guess the build
> server has the same problem.
>
> Jörn
>
> On Thu, 2015-01-29 at 10:25 +0100, Tommaso Teofili wrote:
> > even after my latest commit? If so I'll rearrange the test a bit.
> >
> > Tommaso
> >
> > 2015-01-29 10:21 GMT+01:00 Joern Kottmann :
> >
> > > Or if that is a problem for the test, you could also tell RAT to ignore
> > > it.
> > >
> > > On my machine the test fails. The two strings don't match.
> > >
> > > Jörn
> > >
> > > On Thu, 2015-01-29 at 09:59 +0100, Tommaso Teofili wrote:
> > > > right, thanks I'll fix both.
> > > >
> > > > Tommaso
> > > >
> > > > 2015-01-29 9:54 GMT+01:00 Joern Kottmann :
> > > >
> > > > > This file should have an AL header.
> > > > >
> > > > > Jörn
> > > > >
> > > > > On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote:
> > > > > > Added:
> > > > > >
> > > > >
> > >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> > > > > > URL:
> > > > > >
> > > > >
> > >
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml?rev=1655546&view=auto
> > > > > >
> > > > >
> > >
> ==
> > > > > > ---
> > > > > >
> > > > >
> > >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> > > > > (added)
> > > > > > +++
> > > > > >
> > > > >
> > >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> > > > > Thu Jan 29 08:02:31 2015
> > > > > > @@ -0,0 +1,58 @@
> > > > > > +
> > > > > > +
> > > > > > +
> > > > > > +brown
> > > > > > +fox
> > > > > > +
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
>
>
>


Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Tommaso Teofili
even after my latest commit? If so I'll rearrange the test a bit.

Tommaso

2015-01-29 10:21 GMT+01:00 Joern Kottmann :

> Or if that is a problem for the test, you could also tell RAT to ignore
> it.
>
> On my machine the test fails. The two strings don't match.
>
> Jörn
>
> On Thu, 2015-01-29 at 09:59 +0100, Tommaso Teofili wrote:
> > right, thanks I'll fix both.
> >
> > Tommaso
> >
> > 2015-01-29 9:54 GMT+01:00 Joern Kottmann :
> >
> > > This file should have an AL header.
> > >
> > > Jörn
> > >
> > > On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote:
> > > > Added:
> > > >
> > >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> > > > URL:
> > > >
> > >
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml?rev=1655546&view=auto
> > > >
> > >
> ==
> > > > ---
> > > >
> > >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> > > (added)
> > > > +++
> > > >
> > >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> > > Thu Jan 29 08:02:31 2015
> > > > @@ -0,0 +1,58 @@
> > > > +
> > > > +
> > > > +
> > > > +brown
> > > > +fox
> > > > +
> > >
> > >
> > >
>
>
>


Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Tommaso Teofili
right, thanks I'll fix both.

Tommaso

2015-01-29 9:54 GMT+01:00 Joern Kottmann :

> This file should have an AL header.
>
> Jörn
>
> On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote:
> > Added:
> >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> > URL:
> >
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml?rev=1655546&view=auto
> >
> ==
> > ---
> >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> (added)
> > +++
> >
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
> Thu Jan 29 08:02:31 2015
> > @@ -0,0 +1,58 @@
> > +
> > +
> > +
> > +brown
> > +fox
> > +
>
>
>


Re: svn commit: r1655238 - /opennlp/trunk/

2015-01-28 Thread Tommaso Teofili
I guess I by error removed the previous ignore entries for the Eclipse
files, sorry for the inconvenience.

Tommaso

2015-01-28 9:48 GMT+01:00 :

> Author: joern
> Date: Wed Jan 28 08:48:25 2015
> New Revision: 1655238
>
> URL: http://svn.apache.org/r1655238
> Log:
> Added eclipse files to svn:ignore.
>
> Modified:
> opennlp/trunk/   (props changed)
>
> Propchange: opennlp/trunk/
>
> --
> --- svn:ignore (original)
> +++ svn:ignore Wed Jan 28 08:48:25 2015
> @@ -1,2 +1,6 @@
>  *.iml
>  .idea
> +
> +.settings
> +
> +.project
>
>
>


Re: Text Summarization module?

2015-01-22 Thread Tommaso Teofili
Hi Ram,

since your proposal got positive feedback, maybe you could create an issue
in Jira and attach the code / patch for discussion / review.

Regards,
Tommaso


Re: Comment from December 2014 board meeting about singling out individuals in reports

2015-01-21 Thread Tommaso Teofili
Hi Chris,

thanks, your (Board's) message is clear and I think it makes sense.

Regards,
Tommaso

2015-01-21 8:04 GMT+01:00 Chris Mattmann :

> Hi OpenNLP PMC,
>
> I was tasked with carrying forward a comment from one of the
> ASF directors in the December 2014 board meeting . The concern
> was related to singling out individuals in the report regarding
> their activity - and the board’s interest in project activity
> more so than individual activity. I think the comment was intended
> to suggest in future reports there is no need to directly identify
> the release manager but rather to simply state there was a release,
> it was numbered  and was released on date .
>
> Thanks and wanted to carry that forward.
>
> Cheers,
> Chris
> (on behalf of the Apache Board)
>
>
>


Re: Word Sense Disambiguation

2015-01-19 Thread Tommaso Teofili
+1

Tommaso

2015-01-19 19:10 GMT+01:00 Joern Kottmann :

> Hello,
>
> +1 from me to just go ahead and implement the proposed approach. One
> goal of this implementation will be to figure out the interface we want
> to have in OpenNLP for WSD.
>
> We can later extend OpenNLP with more implementations which are taking
> different approaches.
>
> Jörn
>
> On Thu, 2015-01-15 at 16:50 +0900, Anthony Beylerian wrote:
> > Hello,
> >
> > I'm new here, I previously mentioned to Jörn about my colleagues and
> myself being interested in helping to implement this component, we were
> thinking of starting with simple knowledge based approaches, although they
> do not yield high accuracy, but as a first step they are relatively simple,
> would like your opinion.
> >
> > Pei also mentioned "cTAKES (
> http://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-wsd/ currently very
> exploratory stages here) and YTEX (
> https://code.google.com/p/ytex/wiki/WordSenseDisambiguation_V08) is also
> just exploring WSD for the healthcare domain. It's also currently
> knowledge/ontology base for now... It would be great to see if OpenNLP
> supports a general domain WSD"
> >
> > Best,
> >
> > Anthony
> >
>
>
>


Re: Build changed opennlp/pom.xml moved to root directory

2014-11-20 Thread Tommaso Teofili
IMHO it was about time, thanks Jörn :-)

Regards,
Tommaso

2014-11-20 21:11 GMT+01:00 Joern Kottmann :

> Hello everybody,
>
> we changed the structure of the project slightly. The main pom.xml used
> to be located in opennlp/pom.xml. This was done because an Eclipse
> workspace can't have files at the root level. The Maven convention is to
> have the file at the root level. I think it is time to move this file to
> the root directory to not anymore confuse Maven users (and maybe some
> tools) which expect the file in the root directory.
>
> Please let me know if there are any objections to this.
>
> To build OpenNLP from now on just go the trunk directory and type "mvn
> install".
>
> Jörn
>
>


Re: What should we do with the SF models?

2014-10-29 Thread Tommaso Teofili
In my opinion the long term goal should be to work on training new, Apache2
licensed, ones and make them available to our users; it probably make sense
to take the SF models offline in any case because as long as they are there
people will keep downloading and using them, as that's just much easier
than training new ones.
As a short term goal I agree we should give more visibility to instructions
on how to train new models using existing corpora.

My 2 cents,
Tommaso


2014-10-28 20:37 GMT+01:00 Gustavo Knuppe :

> I believe that models are important for users, since not every user has
> access to appropriate data files to train basic models.
>
> My suggestion is to use an alternative service to host these models,
> like github, torrent or other file share service...
>
> Github is a good option since they don't have any quota or bandwidth
> limitation.
>
> Gustvo K.
>
> 2014-10-28 15:19 GMT-02:00 Joern Kottmann :
>
> > Hi all,
> >
> > OpenNLP always came with a couple of trained models which were ready to
> > use for a few languages. The performance a user encounters with those
> > models heavily depends on their input text.
> >
> > Especially the English name finder models which were trained on MUC 6/7
> > data perform very poorly these days if run on current news articles and
> > even worse on data which is not in the news domain.
> >
> > Anyway, we often get judged on how well OpenNLP works just based on the
> > performance of those models (or maybe people who compare their NLP
> > systems against OpenNLP just love to have OpenNLP perform badly).
> >
> > I think we are now at a point with those models were it is questionable
> > if having them is still an advantage for OpenNLP. The SourceForge page
> > is often blocked due to traffic limitations. We definitely have to act
> > somehow.
> >
> > The old models have definitely some historic value and are used for
> > testing the release.
> >
> > What should we do?
> >
> > We could take them offline and advice our users to train their own
> > models on one of the various corpora we support. We could also do both
> > and place a prominent link to our corpora documentation on the download
> > page and in a less visible place a link to he historic SF models.
> >
> > Jörn
> >
> >
>


Re: OpenNLP Similarity release (Was: Re: Build failed in Jenkins: OpenNLP #476)

2014-10-28 Thread Tommaso Teofili
+1

Tommaso

2014-10-28 9:26 GMT+01:00 Rodrigo Agerri :

> +1 to release the similarity component and making an addon release.
>
> R
>
> On Tue, Oct 28, 2014 at 7:46 AM, Jörn Kottmann  wrote:
> > Yes it would be great to get it released.
> >
> > I suggest we move it from the sandbox to the addons and then we make an
> > addons
> > release.
> >
> > Any opinions?
> >
> > Jörn
> >
> > On 10/27/2014 11:54 PM, Boris Galitsky wrote:
> >>
> >> Hi guys
> >>
> >>since you are taking about the build - when this project is moved to
> >> github, would I have a chance to try to deploy
> >> OpenNLP.Similarity?
> >>   I struggled for some time to deploy it couple of years back.
> >>
> >> Regards
> >> Boris
> >>
> >> 
> >>>
> >>> Subject: Re: Build failed in Jenkins: OpenNLP #476
> >>> From: kottm...@gmail.com
> >>> To: dev@opennlp.apache.org
> >>> Date: Mon, 27 Oct 2014 22:50:17 +0100
> >>>
> >>> On Mon, 2014-10-27 at 19:15 +, Rodrigo Agerri wrote:
> 
>  Hi,
> 
>  This is not caused by my latest commit, is it not?
> >>>
> >>> Your last commit just triggered the build.
> >>> The build itself was successful. It failed afterwards when it tried to
> >>> deploy the artifacts to the snapshot repo with: "503 Service
> Temporarily
> >>> Unavailable"
> >>>
> >>> It probably works if we trigger it.
> >>>
> >>> Jörn
> >>>
> >>
> >
> >
>


Re: Parsing with PCFGs

2014-10-17 Thread Tommaso Teofili
Ok, no problem.
In the meantime I've added the first PCFG implementation in the sandbox,
see http://svn.apache.org/r1632735

Regards,
Tommaso

2014-10-16 11:33 GMT+02:00 Rodrigo Agerri :

> Hello!
>
> No, unfortunately not :)
>
> Cheers,
>
> Rodrigo
>
> On Thu, Oct 16, 2014 at 9:20 AM, Tommaso Teofili
>  wrote:
> > Hi Rodrigo,
> >
> > thanks a lot for your inputs, do you have insights on the "treeinsert"
> > algorithm [1] too?
> >
> > Thanks,
> > Tommaso
> >
> > [1] :
> >
> http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.parsing
> >
> > 2014-10-15 9:38 GMT+02:00 Rodrigo Agerri :
> >
> >> Hi,
> >>
> >> The main algorithm (called chunking in the trunk) is based on
> >> Ratnapharki's work.
> >> It is best to directly read the paper.
> >>
> >> http://link.springer.com/article/10.1023/A:1007502103375
> >>
> >> This is a shift-reduced parser which incidentally are becoming quite
> >> fashionable again. For example, Stanford CoreNLP recently released a
> >> shift-reduced parser themselves, as an alternative to their PCFGs,
> >> lexicalized parser.
> >>
> >> HTH,
> >>
> >> Rodrigo
> >>
> >> On Wed, Oct 15, 2014 at 9:32 AM, Tommaso Teofili
> >>  wrote:
> >> > Hi all,
> >> >
> >> > in a bit of spare time I sketched a basic implementation of (in
> memory)
> >> > probabilistic context free grammars which, if properly trained, can be
> >> used
> >> > to build the parse tree of a given sentence, however (also looking at
> the
> >> > doc on the website) it's not completely clear what's already
> implemented
> >> in
> >> > trunk, I see there are 2 algorithms for parsing, could someone shed
> some
> >> > light on them? And eventually fire an opinion for adding PCFGs as an
> >> > additional algorithm?
> >> >
> >> > Regards,
> >> > Tommaso
> >>
>


Re: Parsing with PCFGs

2014-10-16 Thread Tommaso Teofili
Hi Rodrigo,

thanks a lot for your inputs, do you have insights on the "treeinsert"
algorithm [1] too?

Thanks,
Tommaso

[1] :
http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.parsing

2014-10-15 9:38 GMT+02:00 Rodrigo Agerri :

> Hi,
>
> The main algorithm (called chunking in the trunk) is based on
> Ratnapharki's work.
> It is best to directly read the paper.
>
> http://link.springer.com/article/10.1023/A:1007502103375
>
> This is a shift-reduced parser which incidentally are becoming quite
> fashionable again. For example, Stanford CoreNLP recently released a
> shift-reduced parser themselves, as an alternative to their PCFGs,
> lexicalized parser.
>
> HTH,
>
> Rodrigo
>
> On Wed, Oct 15, 2014 at 9:32 AM, Tommaso Teofili
>  wrote:
> > Hi all,
> >
> > in a bit of spare time I sketched a basic implementation of (in memory)
> > probabilistic context free grammars which, if properly trained, can be
> used
> > to build the parse tree of a given sentence, however (also looking at the
> > doc on the website) it's not completely clear what's already implemented
> in
> > trunk, I see there are 2 algorithms for parsing, could someone shed some
> > light on them? And eventually fire an opinion for adding PCFGs as an
> > additional algorithm?
> >
> > Regards,
> > Tommaso
>


Parsing with PCFGs

2014-10-15 Thread Tommaso Teofili
Hi all,

in a bit of spare time I sketched a basic implementation of (in memory)
probabilistic context free grammars which, if properly trained, can be used
to build the parse tree of a given sentence, however (also looking at the
doc on the website) it's not completely clear what's already implemented in
trunk, I see there are 2 algorithms for parsing, could someone shed some
light on them? And eventually fire an opinion for adding PCFGs as an
additional algorithm?

Regards,
Tommaso


Re: OPENNLP-683

2014-05-06 Thread Tommaso Teofili
+1 from me, I think it would be an interesting and useful contribution.

Tommaso


2014-05-06 20:50 GMT+02:00 Jörn Kottmann :

> Hello,
>
> we got a question in
>
> https://issues.apache.org/jira/browse/OPENNLP-683
>
> if it would be interesting to implement a rule based
> lemmatizer as explained in the issue.
>
> Any opinions about it?
>
> Jörn
>
>


Re: Pluggable Machine Learning support

2013-05-30 Thread Tommaso Teofili
big +1!

Tommaso


2013/5/31 William Colen 

> I don't see any issue. People that uses Maxent directly would need to
> change how they use it, but that is OK for a major release.
>
>
>
>
> On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann  wrote:
>
> > Are there any objections to move the maxent/perceptron classes to an
> > opennlp.tools.ml
> > package as part of this issue? Moving the things would avoid a second
> > interface layer and
> > probably make using OpenNLP Tools a bit easier, because then we are down
> > to a single jar.
> >
> > Jörn
> >
> >
> > On 05/30/2013 08:57 PM, William Colen wrote:
> >
> >> +1 to add pluggable machine learning algorithms
> >> +1 to improve the API and remove deprecated methods in 1.6.0
> >>
> >> You can assign related Jira issues to me and I will be glad to help.
> >>
> >>
> >> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann 
> >> wrote:
> >>
> >>  Hi all,
> >>>
> >>> we spoke about it here and there already, to ensure that OpenNLP can
> stay
> >>> competitive with other NLP libraries I am proposing to make the machine
> >>> learning pluggable.
> >>>
> >>> The extensions should not make it harder to use OpenNLP, if a user
> loads
> >>> a
> >>> model OpenNLP should be capable of setting up everything by itself
> >>> without
> >>> forcing the user to write custom integration code based on the ml
> >>> implementation.
> >>> We solved this problem already with the extension mechanism, we build
> to
> >>> support the customization of our components, I suggest that we reuse
> this
> >>> extension mechanism to load a ml implementation. To use a custom ml
> >>> implementation the user has to specify the class name of the factory in
> >>> the
> >>> Algorithm field of the params file. The params file is available during
> >>> training and tagging time.
> >>>
> >>> Most components in the tools package use the maxent library to do
> >>> classification. The Java interfaces for this are currently located in
> the
> >>> maxent package, to be able to swap the implementation the interfaces
> >>> should
> >>> be defined inside the tools package. To make things easier I propose to
> >>> move the maxent and perceptron implemention as well.
> >>>
> >>> Through the code base we use the AbstractModel, thats a bit unlucky
> >>> because the only reason for this is the lack of model serialization
> >>> support
> >>> in the MaxentModel interface, a serialization method should be added to
> >>> it,
> >>> and maybe renamed to ClassificationModel. This will
> >>> break backward compatibility in non-standard use cases.
> >>>
> >>> To be able to test the extension mechanism I suggest that we implement
> an
> >>> addon which integrates liblinear and the Apache Mahout classifiers.
> >>>
> >>> There are still a few deprecated 1.4 constructors and methods in
> OpenNLP
> >>> which directly reference interfaces and classes in the maxent library,
> >>> these need to be removed, to be able to move the interfaces to the
> tools
> >>> package.
> >>>
> >>> Any opinions?
> >>>
> >>> Jörn
> >>>
> >>>
> >
>


Re: How to make RC2?

2013-03-08 Thread Tommaso Teofili
Hi Jorn,

as far as I know, if you already run mvn release:perform you have to do
that manually (too bad I know), otherwise if you've run release:prepare
then you could safely run mvn release:rollback and have everything as
before.

Regards,
Tommaso



2013/3/8 Jörn Kottmann 

> Hi all,
>
> the question came up on how the RC1 we just did can be rolled back to make
> RC2.
>
> To do that the versions in the poms need all to be changed back to
> 1.5.3-SNAPSHOT respectively 3.0.3-SNAPSHOT.
> I always did that manually in eclipse with the search and replace tool.
>
> The maven release plugins rollback can only be used after the prepare
> stage, but not after release:perform.
>
> HTH,
> Jörn
>


Re: Migrate to Git?

2012-12-20 Thread Tommaso Teofili
in my opinion that would be good, +1
Tommaso


2012/12/19 Jörn Kottmann 

> Hi all,
>
> I heard at ApacheCon Europe that it should be possible to migrate from
> Subverion to Git.
>
> Is there any interest in doing that? If we decide to do it I suggest to
> wait until the
> 1.5.3 release is done so we have a bit time to also migrate our build
> process.
>
> Do have all committers experience with git?
>
> Jörn
>


Re: how to train with CorpusServer

2012-06-21 Thread Tommaso Teofili
Thanks Jörn, I'll just try the tools until the guide is ready.
Tommaso

2012/6/13 Jörn Kottmann 

> Hello,
>
> no there no such guide or tutorial yet.
> I will write a getting started page for it and at it to our wiki.
> But the tools are there and now also much easier to use, because
> some required components can now be used in their released versions.
>
> Jörn
>
>
> On 06/12/2012 02:30 PM, Tommaso Teofili wrote:
>
>> Hi all,
>> reading [1] and back to OPENNLP-385 [2] I wonder if a guide/wiki on how to
>> use the CorpusServer (eventually with UIMA CAS Editor) to train data
>> exists
>> or it'd be possible to create one.
>> I remember there was a more generic wiki page [3] but perhaps a step by
>> step guide could be useful too.
>> Regards,
>> Tommaso
>>
>> [1] : 
>> https://issues.apache.org/**jira/browse/LUCENE-2899<https://issues.apache.org/jira/browse/LUCENE-2899>
>> [2] : 
>> https://issues.apache.org/**jira/browse/OPENNLP-385<https://issues.apache.org/jira/browse/OPENNLP-385>
>> [3] : 
>> https://cwiki.apache.org/**OPENNLP/opennlp-annotations.**html<https://cwiki.apache.org/OPENNLP/opennlp-annotations.html>
>>
>>
>


how to train with CorpusServer

2012-06-12 Thread Tommaso Teofili
Hi all,
reading [1] and back to OPENNLP-385 [2] I wonder if a guide/wiki on how to
use the CorpusServer (eventually with UIMA CAS Editor) to train data exists
or it'd be possible to create one.
I remember there was a more generic wiki page [3] but perhaps a step by
step guide could be useful too.
Regards,
Tommaso

[1] : https://issues.apache.org/jira/browse/LUCENE-2899
[2] : https://issues.apache.org/jira/browse/OPENNLP-385
[3] : https://cwiki.apache.org/OPENNLP/opennlp-annotations.html