Re: [VOTE] Apache OpenNLP 2.3.3 (rc1)

2024-04-22 Thread Tommaso Teofili
+1

tag builds ok, sigs ok.

Regards,
Tommaso

On Mon, 22 Apr 2024 at 08:56, Bruno Kinoshita 
wrote:

> +1
>
> Tag is building fine on my env:
>
> Apache Maven 3.8.5 (3599d3414f046de2324203b78ddcf9b5e4388aa0)
> Maven home: /opt/apache-maven-3.8.5
> Java version: 17.0.10, vendor: Private Build, runtime:
> /usr/lib/jvm/java-17-openjdk-amd64
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "5.15.0-105-generic", arch: "amd64", family:
> "unix"
>
> Thank you!
>
> Bruno
>
> On Sun, 21 Apr 2024 at 10:14, Martin Wiesner  wrote:
>
> > Hi folks,
> >
> > I have posted a 1st release candidate (rc1) for the Apache OpenNLP 2.3.3
> > release and it is ready for testing.
> >
> > This release brings four dependency updates, two bug fixes, minor
> > corrections in the manual, and working integration tests (IT) again!
> > The ITs were not executed for quite some time, but are executed for every
> > regular Maven build now; future work should include introducing a
> separate
> > Maven profile for executing ITs.
> > The manual's CSS got modernized; a preview can be found here:
> > https://github.com/apache/opennlp/pull/591
> > Moreover, this release will ship an abbreviation dictionary for the Dutch
> > language.
> >
> > Thank you to everyone who contributed to this release, including all of
> > our users and the people who submitted bug reports, contributed code or
> > documentation enhancements.
> >
> > The release was made using the OpenNLP release process, documented on the
> > website:
> > https://opennlp.apache.org/release.html
> >
> > Maven Repo:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1038
> >
> > 
> > 
> > opennlp-2.3.3-rc1
> > Testing OpenNLP 2.3.3 release candidate
> > 
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1038
> > 
> > 
> > 
> >
> > Binaries & Source:
> >
> > https://dist.apache.org/repos/dist/dev/opennlp/opennlp-2.3.3
> >
> > Tag:
> >
> > https://github.com/apache/opennlp/releases/tag/opennlp-2.3.3
> >
> > Release notes:
> >
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311215=12354199
> >
> > The results of the eval tests for the aforementioned tag can be found
> > here:
> https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-releases/12
> >
> > Reminder: The up-2-date KEYS file for signature verification can be
> > found here: https://dist.apache.org/repos/dist/release/opennlp/KEYS
> >
> > Please vote on releasing these packages as Apache OpenNLP 2.3.3.
> > The vote is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but everyone is welcome to check
> > the release candidate and vote.
> > The vote passes if at least three binding +1 votes are cast.
> >
> > Please VOTE
> >
> > [+1] go ship it
> > [+0] meh, don't care
> > [-1] stop, there is a ${showstopper}
> >
> > Thanks!
> >
> > Martin | mawiesne
> >
> >
>


Re: [VOTE] Apache OpenNLP 2.3.1 Release Candidate

2023-11-23 Thread Tommaso Teofili
+1

Tommaso

On Thu, 23 Nov 2023 at 12:05, Richard Zowalla  wrote:

> +1 (binding)
>
>
> (We should create an issue for the year in the NOTICE file though)
>
> Am Mittwoch, dem 22.11.2023 um 15:12 +0100 schrieb Martin Wiesner:
> >
> > Hi folks,
> >
> > I have posted a 1st release candidate for the Apache OpenNLP 2.3.1
> > release and it is ready for testing.
> >
> > It is a maintenance release which provides some enhancements.
> > Some of these are related to sentences models and the use of
> > abbreviations, see OPENNLP-570 & OPENNLP-793.
> > Moreover, it switches the ONNX runtime for the 'opennlp-dl' component
> > from the GPU to the CPU-based variant, see OPENNLP-1515.
> > Several other (cleanup) tasks have also been completed.
> >
> > Thank you to everyone who contributed to this release, including all
> > of our users and the people who submitted bug reports, contributed
> > code or documentation enhancements.
> >
> > The release was made using the OpenNLP release process, documented on
> > the website:
> > https://opennlp.apache.org/release.html
> >
> > Maven Repo:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1035
> >
> > 
> > 
> > opennlp-2.3.1-rc1
> > Testing OpenNLP 2.3.1 release candidate
> > 
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1035
> > 
> > 
> > 
> >
> > Binaries & Source:
> >
> > https://dist.apache.org/repos/dist/dev/opennlp/opennlp-2.3.1
> >
> > Tag:
> >
> > https://github.com/apache/opennlp/releases/tag/opennlp-2.3.1
> >
> > Release notes:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311215=12353478
> >
> > The results of the eval tests for the aforementioned tag can be found
> > here:
> > https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-releases/9/
> >
> > Reminder: The up-2-date KEYS file for signature verification can be
> > found here: https://dist.apache.org/repos/dist/release/opennlp/KEYS
> >
> > Please vote on releasing these packages as Apache OpenNLP 2.3.1. The
> > vote is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but everyone is welcome to
> > check the release candidate and vote.
> > The vote passes if at least three binding +1 votes are cast.
> >
> > Please VOTE
> >
> > [+1] go ship it
> > [+0] meh, don't care
> > [-1] stop, there is a ${showstopper}
> >
> > Thanks!
> > mawiesne
>
>


Re: OpenNLP 2.3.0 ?

2023-07-25 Thread Tommaso Teofili
+1 for me too

Tommaso

On Tue, 25 Jul 2023 at 16:04, Martin Wiesner 
wrote:

>
> I'm also +1.
>
> Best
> Martin
> --
> Am Dienstag, Juli 25, 2023 13:56 CEST, schrieb Bruno Kinoshita <
> brunodepau...@gmail.com>:
>  Not from me, +1 !
>
> On Tue, 25 Jul 2023 at 10:49, Richard Zowalla  wrote:
>
> > Hi all,
> >
> > any objections in doing OpenNLP 2.3.0 after we have merged the memory
> > improvement by Martin?
> >
> > We have some dependency updates, resource leak fixes and Java 17.
> >
> > Thoughts?
> >
> > Gruß
> > Richard
> >
>
>
>
> -- 
> __
> Colorful Bytes UG (haftungsbeschränkt)
> Blumenstraße 20
> 74906 Bad Rappenau
> Sitz: Bad Rappenau
> Registergericht Stuttgart
> HRB 758952
> USt-IdNr.: DE309647220
>  Geschäftsführung:
> Dr. Monika Pobiruchin
> Martin Wiesner
> Dr. Richard Zowalla
> Daniel ZsebeditsWebsite
> E-Mail
> Telefon
> © 2023 Colorful Bytes UG (haftungsbeschr.)
>


Re: Renaming of master branch to main

2022-12-15 Thread Tommaso Teofili
+1 from me as well.

Tommaso

On Thu, 15 Dec 2022 at 11:58, Richard Zowalla  wrote:
>
> Hi,
>
> I am +1 for that one.
>
> Most of the mechanical work is done by INFRA.
> We only need to adjust build bot and github actions
>
> Gruß
> Richard
>
> On 2022/07/28 13:22:29 Jeff Zemerick wrote:
> > Hi all,
> >
> > I want to see about the community's input as to renaming the master branch
> > to main for the opennlp and opennlp-site repositories. I see quite a few
> > ASF projects have made the change and I personally think we should, too,
> > for the same reason of promoting inclusivity.
> >
> > I think we would have to update the target branch of the outstanding PRs,
> > update the build config, and perhaps let Infra know so BuiltBot for
> > opennlp-site will keep working. Anything else?
> >
> > Thoughts? Does this require a vote?
> >
> > Thanks,
> > Jeff
> >


Re: OpenNLP 2.0 release discussion

2022-04-06 Thread Tommaso Teofili
+1

Tommaso

On Wed, 6 Apr 2022 at 04:00, Bruno P. Kinoshita
 wrote:

>  +1 Jeff, thanks!
>
> Bruno
>
> On Wednesday, 6 April 2022, 02:34:35 am NZST, Jeff Zemerick <
> jzemer...@apache.org> wrote:
>
>  Hi all,
>
> I would like to propose an OpenNLP 2.0 release for the following reasons:
>
> - There are a few significant changes: Building using Java 11, support for
> ONNX models, automatic model downloading
> - User activity has been somewhat low and a 2.0 release might help bring
> attention to these new features.
> - 1.x has been around for 10+ years. :)
> - Other reasons?
>
> Thoughts? Concerns?
>
> Thanks,
> Jeff
>
>
> Our current master branch has the following changes:
>
> Bug
> [OPENNLP-1353] - DictonaryLemmatizer missing charset
>
> Improvement
> [OPENNLP-565] - Add MASC format support
> [OPENNLP-1185] - Tokenizers should be able to output a new line token
> [OPENNLP-1306] - NameSample overlap exception not helpful
>
> Task
> [OPENNLP-1318] - Add ability to download models from within OpenNLP
> [OPENNLP-1351] - Support ONNX models
> [OPENNLP-1354] - Change build to use Java 11
> [OPENNLP-1355] - Document ONNX capability introduced in OPENNLP-1351
> [OPENNLP-1356] - Document the ONNX implementations
> [OPENNLP-1359] - Build fails with Java 17
> [OPENNLP-1364] - Move setKeepNewLines to the Tokenizer class
>
> Documentation
> [OPENNLP-1319] - The Training API code is outdated in Manual
>


Re: [VOTE] Apache OpenNLP Models 1.0

2021-05-12 Thread Tommaso Teofili
Hi all,

sorry for the delay, I've finally managed to have a look at the artifacts
and they look good to me (sigs, etc.).

+1 from me

Regards,
Tommaso

On Sun, 25 Apr 2021 at 10:10, Bruno P. Kinoshita  wrote:

>  Hi Jeff,
>
> Sorry for the delay on voting. From what I recall, one issue raised in the
> last thread I think was the location of the models. These looks to be
> hosted at ASF hosted dist folders, so shouldn't have any more issues.
>
> Checked signatures and found no errors. Looked at each text file int he
> dist area and everything looks good. Verified the mentioned files (logs
> archive, example models, etc) all either existed or had no issues.
> Everything OK.
>
> The NOTICE file says 2017, and in Commons we try to update that to the
> year of the release of the component, but that shouldn't be a blocker I
> think.
>
> +1
>
> Thank you!
> Bruno
>
> On Friday, 23 April 2021, 4:42:41 am NZST, Jeff Zemerick <
> jzemer...@apache.org> wrote:
>
>  Hi all,
>
> Just a reminder that this vote thread is still active.
>
> Thanks,
> Jeff
>
>
> On Wed, Apr 14, 2021 at 10:41 AM Jeff Zemerick 
> wrote:
>
> > All,
> >
> > I am calling a vote to release the Apache OpenNLP models trained on the
> > Universal Dependencies corpus.
> >
> > The models and text files showing training and evaluation results along
> > with information about how the models were trained are available at:
> > https://dist.apache.org/repos/dist/dev/opennlp/ud-models-1.0/
> >
> > Upon a successful vote the models will be made available on the Apache
> > OpenNLP website on the Models Download page (
> > https://opennlp.apache.org/models.html). Work will then continue on a
> > pull request to modify OpenNLP to be able to automatically download and
> use
> > these pre-trained models when a local model is not provided by the user.
> > (The goal being to lower the barrier of entry into OpenNLP and make its
> use
> > more convenient.)
> >
> > Please vote on releasing the models as model version 1.0. The vote is
> open
> > for at least the next 72 hours.
> >
> > Only votes from the OpenNLP PMC are binding but everyone is welcome to
> > check the models and vote. The vote passes if at least three binding +1
> > votes are cast.
> >
> > [ ] +1 Release the models as version 1.0.
> > [ ] -1 Do not release the models as 1.0 because...
> >
> > Thanks!
> > Jeff
> >
> >
>


Re: [VOTE] Release OpenNLP Models trained on UD

2021-03-12 Thread Tommaso Teofili
+1

Tommaso

On Sat, 13 Mar 2021 at 08:25, Bruno P. Kinoshita  wrote:

>  [x] +1 Release the models as model version 1.0.
>
>
> On Saturday, 13 March 2021, 2:39:37 am NZDT, Jeff Zemerick <
> jzemer...@apache.org> wrote:
>
>  All,
>
> I am calling a vote to release the Apache OpenNLP models trained on the
> Universal Dependencies corpus. The models were described in a previous
> thread you can see at
> https://www.mail-archive.com/dev@opennlp.apache.org/msg03054.html.
>
> This vote is to release the models as version 1.0. (The models are still
> available in the Dropbox folder at
>
> https://www.dropbox.com/sh/p8focuz0qwvw84b/AAC6GqO8mqZn_xkAqHZsVAsoa?dl=0=
> along with text files showing the training and evaluation results).
>
> Upon a successful vote the models will be made available on the Apache
> OpenNLP website on the Models Download page (
> https://opennlp.apache.org/models.html). Work will then continue to modify
> OpenNLP to be able to automatically download and use these pre-trained
> models when none is specifically loaded by the user. (The goal being to
> lower the barrier of entry into OpenNLP and make its use more convenient.)
>
> Please vote on releasing the models in the linked DropBox folder as model
> version 1.0. The vote is open for at least the next 72 hours.
>
> Only votes from the OpenNLP PMC are binding but everyone is welcome to
> check the models and vote. The vote passes if at least three binding +1
> votes are cast.
>
> [ ] +1 Release the models as model version 1.0.
> [ ] -1 Do not release the models because...
>
> Thanks!
> Jeff
>


Re: [VOTE] Apache OpenNLP 1.9.3 Release Candidate

2020-07-29 Thread Tommaso Teofili
+1 from me, build, sigs, tag look good.

Regards,
Tommaso

On Tue, 28 Jul 2020 at 10:48, Bruno P. Kinoshita  wrote:

> It worked after I imported keys from
> https://dist.apache.org/repos/dist/release/opennlp/KEYS
>
> [x] +1 Release the packages as Apache OpenNLP 1.9.3
>
>
> Thanks!
> Bruno
>
>
> On Monday, 27 July 2020, 12:00:29 am NZST, Jeff Zemerick <
> jzemer...@apache.org> wrote:
>
>
>
>
>
> Looks like I'm in there as jzemerick. See if I'm doing this correctly:
>
> wget https://people.apache.org/keys/group/opennlp.asc
> gpg --import https://people.apache.org/keys/group/opennlp.asc
>
> wget
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1027/org/apache/opennlp/opennlp-distr/1.9.3/opennlp-distr-1.9.3-bin.tar.gz
> wget
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1027/org/apache/opennlp/opennlp-distr/1.9.3/opennlp-distr-1.9.3-bin.tar.gz.asc
>
> gpg --verify opennlp-distr-1.9.3-bin.tar.gz.asc
> gpg: assuming signed data in 'opennlp-distr-1.9.3-bin.tar.gz'
> gpg: Signature made Fri Jul 24 15:21:24 2020 UTC
> gpg:using RSA key 6786BCFFBD2AE66E737FE97760E63AD841EF12D8
> gpg: Good signature from "Jeff Zemerick (CODE SIGNING KEY) <
> jzemer...@apache.org>" [unknown]
> gpg: WARNING: This key is not certified with a trusted signature!
> gpg:  There is no indication that the signature belongs to the
> owner.
> Primary key fingerprint: 6786 BCFF BD2A E66E 737F  E977 60E6 3AD8 41EF 12D8
>
> Jeff
>
>
> On Sun, Jul 26, 2020 at 5:25 AM Bruno P. Kinoshita 
> wrote:
>
> > Hi,
> >
> >
> > Built successfully from tag with Java 8 on Ubuntu LTS. Had a look at one
> > file from the dist area, and the contents looked OK (license, notice,
> jars
> > were using the right version 1.9.3 too).
> >
> >
> > Also checked the signatures using some shell script I normally use, but
> it
> > failed to validate. I think it failed to find your key in
> > https://people.apache.org/keys/group/opennlp.asc. Have you added your
> key
> > there? I search for Jeff and jzonthemtn, but couldn't find it.
> >
> >
> > Cheers
> >
> > Bruno
> >
> >
> >
> > On Saturday, 25 July 2020, 11:08:12 pm NZST, Jeff Zemerick <
> > jzemer...@apache.org> wrote:
> >
> >
> >
> >
> >
> > Hi folks,
> >
> > I have posted a 1st release candidate for the Apache OpenNLP 1.9.3
> release
> > and it is ready for testing.
> >
> > The distributables can be downloaded from:
> >
> >
> https://repository.apache.org/content/repositories/orgapacheopennlp-1027/org/apache/opennlp/opennlp-distr/1.9.3/
> >
> > The release was made from the Apache OpenNLP 1.9.3 tag at:
> > https://github.com/apache/opennlp/tree/opennlp-1.9.3
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.9.3 and add the following URL to your settings.xml
> file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1027
> >
> > The release was made using the OpenNLP release process, documented on the
> > website:
> > https://opennlp.apache.org/release.html
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.9.3. The vote
> > is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but everyone is welcome to check
> > the release candidate and vote.
> > The vote passes if at least three binding +1 votes are cast.
> >
> > [ ] +1 Release the packages as Apache OpenNLP 1.9.3
> > [ ] -1 Do not release the packages because...
> >
> > Thanks!
> >
> > Jeff
> >
>


Re: [VOTE] Apache OpenNLP 1.9.2 Release Candidate

2019-12-23 Thread Tommaso Teofili
+1 (binding)

tag build succeeds (jdk 8), signatures ok.

Regards,
Tommaso

On Mon, 23 Dec 2019 at 13:32, Jeff Zemerick  wrote:

> +1 binding
>
> verified signatures
> built and tested from opennlp-1.9.2 tag using openjdk 8
>
> On Fri, Dec 20, 2019 at 11:07 AM Jeff Zemerick 
> wrote:
>
> > Hi folks,
> >
> > I have posted a 1st release candidate for the Apache OpenNLP 1.9.2
> release
> > and it is ready for testing.
> >
> > The distributables can be downloaded from:
> >
> >
> https://repository.apache.org/content/repositories/orgapacheopennlp-1026/org/apache/opennlp/opennlp-distr/1.9.2/
> >
> > The release was made from the Apache OpenNLP 1.9.2 tag at:
> > https://github.com/apache/opennlp/tree/opennlp-1.9.2
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.9.2 and add the following URL to your settings.xml
> file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1026
> >
> > The release was made using the OpenNLP release process, documented on the
> > website:
> > https://opennlp.apache.org/release.html
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.9.2. The vote
> > is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but everyone is welcome to check
> > the release candidate and vote.
> > The vote passes if at least three binding +1 votes are cast.
> >
> > [ ] +1 Release the packages as Apache OpenNLP 1.9.2
> > [ ] -1 Do not release the packages because...
> >
> > Thanks!
> >
> > Jeff
> >
>


Re: OpenNLP 1.9.2 and Java 8/11

2019-12-14 Thread Tommaso Teofili
+1

Thanks Jeff!

Il giorno sab 14 dic 2019 alle 15:32 Jeff Zemerick 
ha scritto:

> During preparation for a 1.9.2 release it was noticed that the current
> master branch fails a few of the regression tests when built using OpenJDK
> 11. (All tests pass when using OpenJDK 8.) Unless there are any significant
> objections, the 1.9.2 release will be built using OpenJDK 8 and the task
> [1] to address the failing regression tests on OpenJDK 11 will be addressed
> in the next minor release.
>
> Thanks,
> Jeff
>
> [1] https://issues.apache.org/jira/browse/OPENNLP-1285
>


Re: [VOTE] Apache OpenNLP 1.9.0 Release Candidate 2

2018-07-02 Thread Tommaso Teofili
+1
Il giorno lun 2 lug 2018 alle 10:34 Rodrigo Agerri 
ha scritto:

> +1
>
> Rodrigo
>
> On Sun, Jul 1, 2018 at 12:42 AM, Koji Sekiguchi
>  wrote:
> > I tested mvn install and some Eval tests (OntoNotes4NameFinderEval,
> > Conll02NameFinderEval, OntoNotes4PosTaggerEval) which use
> > FeatureGeneratorUtil.
> >
> > +1
> >
> > Koji
> >
> >
> >
> > On 2018/06/29 20:45, Jeff Zemerick wrote:
> >>
> >> Hi folks,
> >>
> >> I have posted a 2nd release candidate for the Apache OpenNLP 1.9.0
> release
> >> and it is ready for testing.
> >>
> >> The distributables can be downloaded from:
> >>
> >>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1022/org/apache/opennlp/opennlp-distr/1.9.0/
> >>
> >> The release was made from the Apache OpenNLP 1.9.0 RC2 tag at:
> >> https://github.com/apache/opennlp/tree/opennlp-1.9.0-rc2
> >>
> >> To use it in a maven build set the version for opennlp-tools or
> >> opennlp-uima to 1.9.0 and add the following URL to your settings.xml
> file:
> >>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1022
> >>
> >> The release was made using the OpenNLP release process, documented on
> the
> >> website:
> >> https://opennlp.apache.org/release.html
> >>
> >> Please vote on releasing these packages as Apache OpenNLP 1.9.0. The
> vote
> >> is open for at least the next 72 hours.
> >>
> >> Only votes from OpenNLP PMC are binding, but everyone is welcome to
> check
> >> the release candidate and vote.
> >> The vote passes if at least three binding +1 votes are cast.
> >>
> >> [ ] +1 Release the packages as Apache OpenNLP 1.9.0
> >> [ ] -1 Do not release the packages because...
> >>
> >> Thanks!
> >> Jeff
> >>
> >
>


NLP - OSS workshop @ ACL 2018

2018-03-15 Thread Tommaso Teofili
Hi all (sorry for cross posting),

this year ACL 2018 [1] will run a workshop on NLP in the open source [2].
CFP is open until March 25th, you can find the list of topics at [3].
Here's the list of invited speakers:
- Christopher Manning, Stanford University
- Matthew Honnibal and Ines Montani, Explosion AI
- Joel Nothman, University of Sydney

I think it'd be an interesting opportunity for projects at ASF to share
interesting work and experience in the NLP OSS space.

Regards,
Tommaso

[1] : http://acl2018.org/
[2] : http://nlposs.github.io
[3] : https://nlposs.github.io/#cfp


Re: [DISCUSS] - (ONIP-1) Better language model support

2018-02-01 Thread Tommaso Teofili
ok Suneel, I'll put down a more detailed design on a gdoc and share it here
as soon as I have it.

Regards,
Tommaso

Il giorno sab 27 gen 2018 alle ore 18:32 Suneel Marthi <
suneel.mar...@gmail.com> ha scritto:

> Thanks Tommaso.
>
> Could u share a google doc with the design, we can post the same onto the
> Wiki after the Google doc's been finalized.
>
> Its easier to comment on and make changes to a Google doc.
>
> On Sat, Jan 27, 2018 at 9:50 AM, Tommaso Teofili <
> tommaso.teof...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > recently I've created
> > https://cwiki.apache.org/confluence/display/OPENNLP/
> > ONIP-1+Better+language+model+support
> > as
> > a description of possible useful improvements to our ngram language model
> > implementation.
> > Feedback welcome.
> >
> > Regards,
> > Tommaso
> >
> > p.s.:
> > we created a wiki page containing possible such improvements at
> > https://cwiki.apache.org/confluence/display/OPENNLP/
> > OpenNLP+Improvement+Proposals,
> > feel free to create other proposals
> >
>


[DISCUSS] - (ONIP-1) Better language model support

2018-01-27 Thread Tommaso Teofili
Hi all,

recently I've created
https://cwiki.apache.org/confluence/display/OPENNLP/ONIP-1+Better+language+model+support
as
a description of possible useful improvements to our ngram language model
implementation.
Feedback welcome.

Regards,
Tommaso

p.s.:
we created a wiki page containing possible such improvements at
https://cwiki.apache.org/confluence/display/OPENNLP/OpenNLP+Improvement+Proposals,
feel free to create other proposals


Re: [VOTE] Apache OpenNLP 1.8.4 Release Candidate

2017-12-21 Thread Tommaso Teofili
+1 build ok, tag ok, sigs ok

Tommaso

Il giorno gio 21 dic 2017 alle ore 17:35 Dan Russ  ha
scritto:

> [ X] +1 Release the packages as Apache OpenNLP 1.8.4
>
> > On Dec 21, 2017, at 9:44 AM, Jeff Zemerick  wrote:
> >
> > Hi Folks,
> >
> > I have posted a first release candidate for the Apache OpenNLP 1.8.4
> > release and it is ready for testing.
> >
> > The RC1 distributables can be downloaded from here:
> >
> https://repository.apache.org/content/repositories/orgapacheopennlp-1020/org/apache/opennlp/opennlp-distr/1.8.4
> >
> > The release was made from the Apache OpenNLP 1.8.4 tag at
> > https://github.com/apache/opennlp/tree/opennlp-1.8.4
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.8.4 and add the following URL to your settings.xml
> file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1020
> >
> > The release was made using the OpenNLP release process, documented on the
> > Wiki here:
> > https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
> >
> > The release contains quite some changes, please refer to the contained
> > issue list for details.
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.8.4. The
> vote is
> > open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but folks are welcome to check
> the
> > release candidate and voice their approval or disapproval. The vote
> passes
> > if at least three binding +1 votes are cast.
> >
> > [ ] +1 Release the packages as Apache OpenNLP 
> > [ ] -1 Do not release the packages because...
> >
> > Thanks!
> > Jeff Zemerick
>
>


Re: [VOTE] Apache OpenNLP 1.8.3 Release Candidate

2017-10-25 Thread Tommaso Teofili
+1 (binding)

- source build from tag ok
- sigs and checks ok

Il giorno mer 25 ott 2017 alle ore 18:09 Steve Blackmon <
sblack...@apache.org> ha scritto:

>  +1 non-binding
>
> - source builds, tests pass
> - verified checksums and signatures
>
> Steve Blackmon
> sblack...@apache.org
>
> On Oct 25, 2017 at 10:17 AM, Dan Russ  wrote:
>
>
> +1 burrito
>
> ran units test on my downstream code that uses opennlp-tools.
>
> On Oct 25, 2017, at 6:58 AM, Suneel Marthi  wrote:
>
> +1 binding
>
> 1. Verified Sigs and hashes
> 2. Ran a clean build from {src} * {zip, tar}
> 3. All unit tests pass
>
> On Wed, Oct 25, 2017 at 3:08 PM, Bruno P. Kinoshita <
> brunodepau...@yahoo.com.br.invalid> wrote:
>
> [ X ] +1 Release the packages as Apache OpenNLP 1.8.3
>
> `mvn clean test install` working fine, checked artefacts signatures,
> matching with what was in the vote e-mail.
>
> Currently on tag 1.8.3, commit b317159cb9857dc509c08a31a98dc61209f39bff
>
> Thanks for preparing this release.
>
> Cheers
> Bruno
>
>
>
> 
> From: Suneel Marthi 
> To: dev@opennlp.apache.org; us...@opennlp.apache.org
> Sent: Tuesday, 24 October 2017 10:29 PM
> Subject: [VOTE] Apache OpenNLP 1.8.3 Release Candidate
>
>
>
> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
>
> 1.8.3 Release Candidate.
>
>
> The Release artifacts can be downloaded from:
>
>
> https://repository.apache.org/content/repositories/orgapache
>
> opennlp-1010/org/apache/opennlp/opennlp-distr/1.7.2/
>
>
> The release was made from the Apache OpenNLP 1.8.3 tag at
>
>
> https://github.com/apache/opennlp/tree/opennlp-1.8.3
>
>
> To use it in a maven build set the version for opennlp-tools or
> opennlp-uima
>
> to 1.8.3
>
>
> and add the following URL to your settings.xml file:
>
>
> https://repository.apache.org/content/repositories/
> orgapacheopennlp-1019/org/apache/opennlp/opennlp-distr/1.8.3/
>
>
> The artifacts have been signed with the Key - D3541808 found at
>
>
> http://people.apache.org/keys/group/opennlp.asc
>
>
> Please vote on releasing these packages as Apache OpenNLP 1.8.3. The vote
> is
>
>
> open for either the next 72 hours or a minimum of 3 +1 PMC binding votes
>
> whichever happens earlier.
>
>
> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
>
>
> release candidate and voice their approval or disapproval. The vote passes
>
>
> if at least three binding +1 votes are cast.
>
>
> [ ] +1 Release the packages as Apache OpenNLP 1.8.3
>
>
> [ ] -1 Do not release the packages because...
>
>
> Thanks again to all the committers and contributors for their work
>
> over the past
>
> few weeks.
>


Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate 2

2017-09-12 Thread Tommaso Teofili
+1

Tommaso

Il giorno lun 11 set 2017 alle ore 09:12 Joern Kottmann 
ha scritto:

> Hi Folks,
>
>
> I have posted a second release candidate for the Apache OpenNLP 1.8.2
> release and it is ready for testing.
>
>
> The RC 2 distributables can be downloaded from here:
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-1018/org/apache/opennlp/opennlp-distr/1.8.2/
>
>
> The release was made from the Apache OpenNLP 1.8.2 tag at
> https://github.com/apache/opennlp/tree/opennlp-1.8.2
>
>
> To use it in a maven build set the version for opennlp-tools or
> opennlp-uima to 1.8.2 and add the following URL to your settings.xml
> file:
> https://repository.apache.org/content/repositories/orgapacheopennlp-1018
>
> The release was made using the OpenNLP release process, documented on
> the Wiki here:
> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>
> The release contains quite some changes, please refer to the contained
> issue list for details.
>
>
> Please vote on releasing these packages as Apache OpenNLP 1.8.2. The vote
> is
> open for at least the next 72 hours.
>
>
> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
> release candidate and voice their approval or disapproval. The vote passes
> if at least three binding +1 votes are cast.
>
>
> [ ] +1 Release the packages as Apache OpenNLP 1.8.2
> [ ] -1 Do not release the packages because...
>
>
> Thanks!
>
> Jörn
>
> P.S. Here is my +1.
>


Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate

2017-09-05 Thread Tommaso Teofili
+1

Tommaso

Il giorno mar 5 set 2017 alle ore 05:08 Suneel Marthi 
ha scritto:

> +1 binding
>
> On Mon, Sep 4, 2017 at 5:41 PM, Joern Kottmann  wrote:
>
> > Hi Folks,
> >
> >
> > I have posted a first release candidate for the Apache OpenNLP 1.8.2
> > release and it is ready for testing.
> >
> >
> > The RC 1 distributables can be downloaded from here:
> > https://repository.apache.org/content/repositories/
> > orgapacheopennlp-1017/org/apache/opennlp/opennlp-distr/1.8.2/
> >
> >
> > The release was made from the Apache OpenNLP 1.8.2 tag at
> > https://github.com/apache/opennlp/tree/opennlp-1.8.2
> >
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.8.2 and add the following URL to your settings.xml
> > file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-1017
> >
> > The release was made using the OpenNLP release process, documented on
> > the Wiki here:
> > https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
> >
> > The release contains quite some changes, please refer to the contained
> > issue list for details.
> >
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.8.2. The vote
> > is
> > open for at least the next 72 hours.
> >
> >
> > Only votes from OpenNLP PMC are binding, but folks are welcome to check
> the
> > release candidate and voice their approval or disapproval. The vote
> passes
> > if at least three binding +1 votes are cast.
> >
> >
> > [ ] +1 Release the packages as Apache OpenNLP 1.8.2
> > [ ] -1 Do not release the packages because...
> >
> >
> > Thanks!
> >
> > Jörn
> >
> > P.S. Here is my +1.
> >
>


Re: Release of TREC Dynamic Domain: Polar Dataset

2017-08-15 Thread Tommaso Teofili
cool, thanks Chris for sharing.

Regards,
Tommaso

Il giorno mer 9 ago 2017 alle ore 18:56 Mattmann, Chris A (3010) <
chris.a.mattm...@jpl.nasa.gov> ha scritto:

> Hi,
>
> We have released our dataset collected from 2015-16 in the Polar Domain,
> called
> the TREC Dynamic Domain Polar dataset.
>
> Researchers interested in a rich dataset collected across the Scientific
> and Deep web
> can use mine HTML pages, PDF files, images, video, audio, and other
> formats for
> scientific insights.
>
> The data is described here:
>
> https://github.com/chrismattmann/trec-dd-polar
>
> And available from the NSF Arctic Data Center here:
>
> https://arcticdata.io/catalog/#view/doi:10.18739/A2280J
>
> If you use the dataset in your work, please consider citing it:
>
> @inproceedings{burgess2015trec,
>   title={TREC Dynamic Domain: Polar Science.},
>   author={Burgess, Annie Bryant and Mattmann, Chris and Totaro, Giuseppe
> and McGibbney, Lewis John and Ramirez, Paul M},
>   booktitle={TREC},
>   year={2015}
> }
>
> (our TREC paper, and/or the DOI from the actual dataset).
>
> Enjoy!
>
> Cheers,
> Chris Mattmann
>
>
>
> ++
> Chris Mattmann, Ph.D.
> Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 180-503E, Mailstop: 180-503
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
>
>
>


Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 3

2017-07-07 Thread Tommaso Teofili
+1 sigs and build ok, langdetect ok.

Regards,
Tommaso

Il giorno ven 7 lug 2017 alle ore 15:55 William Colen  ha
scritto:

> +1 - Tested with multiple other projects. Tested language detector.
>
> 2017-07-07 10:52 GMT-03:00 Joern Kottmann :
>
> > +1 i did run the eval the tests and they passed
> >
> > Jörn
> >
> > On Fri, Jul 7, 2017 at 1:06 PM, Bruno P. Kinoshita
> >  wrote:
> > > Build passing OK with the following environment:
> > > Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> > 2015-11-11T05:41:47+13:00)
> > > Maven home: /opt/maven
> > > Java version: 1.8.0_131, vendor: Oracle Corporation
> > > Java home: /usr/lib/jvm/java-8-oracle/jre
> > > Default locale: en_US, platform encoding: UTF-8
> > > OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family:
> > "unix"
> > >
> > > Had a look at simple reports (findbugs, pmd), all looking good.
> > > [ X ] +1 Release the packages as Apache OpenNLP 1.8.1
> > >
> > > ThanksBruno
> > > 
> > > On Thursday, 6 July 2017, 1:21:32 AM NZST, Suneel Marthi <
> > smar...@apache.org> wrote:
> > >
> > >
> > > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> > 1.8.1
> > > Release Candidate 3.
> > >
> > > The Release artifacts can be downloaded from:
> > >
> > > https://repository.apache.org/content/repositories/
> > orgapacheopennlp-1016/org/apache/opennlp/opennlp-distr/1.8.1/
> > >
> > > The release was made from the Apache OpenNLP 1.8.1 tag at
> > >
> > > https://github.com/apache/opennlp/tree/opennlp-1.8.1
> > >
> > > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima
> > > to 1.8.1
> > >
> > > and add the following URL to your settings.xml file:
> > >
> > > https://repository.apache.org/content/repositories/
> > orgapacheopennlp-1016/
> > >
> > > The artifacts have been signed with the Key - D3541808 found at
> > >
> > > http://people.apache.org/keys/group/opennlp.asc
> > >
> > > Please vote on releasing these packages as Apache OpenNLP 1.8.1. The
> > vote is
> > >
> > > open for the next 72 hours *ending on Saturday, July 8AM EST *.
> > >
> > > Only votes from OpenNLP PMC are binding, but folks are welcome to check
> > the
> > >
> > > release candidate and voice their approval or disapproval. The vote
> > passes
> > >
> > > if at least three binding +1 votes are cast.
> > >
> > > [ ] +1 Release the packages as Apache OpenNLP 1.8.1
> > >
> > > [ ] -1 Do not release the packages because...
> > >
> > > Thanks again to all the committers and contributors for their work
> > > over the past
> > > few weeks.
> >
>


Re: Document Categorizer based on Glove + LSTM (powered by DL4J)

2017-07-05 Thread Tommaso Teofili
thanks Thamme for bringing this to the list!


Il giorno mer 5 lug 2017 alle ore 03:49 Thamme Gowda  ha
scritto:

> Hello OpenNLP Devs,
>
> I am working with text classification using word embeddings like
> Gloves/Word2Vec and LSTM networks.
> It will be interesting to see if we can use it as document categorizer,
> especially for sentiment analysis in OpenNLP.
>
> I have already raised a PR to the sandbox repo -
> https://github.com/apache/opennlp-sandbox/pull/3
>
> This is first version, and I expect to receive feedback from Dev community
> to make it work for everyone.
>
> Here are the design choices I have made for the initial version:
>
>- Using pre-trained Gloves - I felt the glove vector format is clean,
>easily customizable in terms of dimensions and vocabulary size, and
> (also I
>have been reading a lot about them from Stanford NLP group).
>   - Training Gloves isnt hard either, we can do it using the original C
>   library as well as by using DL4J.
>   - Using DL4J's Multi layer networks with LSTM instead of reinventing
>this stuff again on JVM for OpenNLP
>
>
> Please share your feedback here or on the github page
> https://github.com/apache/opennlp-sandbox/pull/3 .
>
>
I think the approach outlined here sounds good, I think we could
incorporate the PR as soon as it implements the Doccat API.
Then we may see whether and how it makes sense to adjust it to use other
types of embeddings (e.g. paragraph vectors) and / or different network
setups (e.g. more hidden layers, bidirectionalLSTM, etc.).

Looking forward to see this move forward,
Regards,
Tommaso


>
> Thanks,
> TG
>
>
> --
> *Thamme Gowda *
> @thammegowda  |
> http://scf.usc.edu/~tnarayan/
> ~Sent via somebody's Webmail server
>


Re: Public datasets for Semantic Relationship Extraction

2017-06-29 Thread Tommaso Teofili
sure, sounds good to me.
Best is to open separate issues for each of the tasks.

Regards,
Tommaso

Il giorno gio 29 giu 2017 alle ore 16:01 Chris Mattmann <mattm...@apache.org>
ha scritto:

> Hey Tommaso I was thinking both…but mainly use the datasets for specific
> tasks since
> they seem to be open. Labeled data is hard to come by (
>
> Thoughts?
>
> Cheers,
> Chris
>
>
>
> On 6/28/17, 11:43 PM, "Tommaso Teofili" <tommaso.teof...@gmail.com> wrote:
>
> Hi Chris,
>
> what do you mean specifically ? Leverage some of the works mentioned
> in the
> papers or leverage datasets for specific tasks ? Or both ?
> IIRC there was an OpenNLP page mentioning its usage for the bio NLP
> task,
> not sure about the others.
>
> Regards,
> Tommaso
>
>
> Il giorno mer 28 giu 2017 alle ore 20:58 Chris Mattmann <
> mattm...@apache.org>
> ha scritto:
>
> > Ahh here it is, sorry about that:
> >
> >
> https://github.com/davidsbatista/Annotated-Semantic-Relationships-Datasets
> >
> >
> >
> > On 6/28/17, 11:52 AM, "Suneel Marthi" <smar...@apache.org> wrote:
> >
> > Forced me to join that group first - so will patiently wait for
> the
> > group
> > moderator to consider/rule out my application to join that group
> and
> > then
> > maybe I get to read that post. 
> >
> >
> >
> > On Wed, Jun 28, 2017 at 2:44 PM, Chris Mattmann <
> mattm...@apache.org>
> > wrote:
> >
> > > Hi Team,
> > >
> > > Anything here that we can use in OpenNLP?
> > >
> > > https://www.linkedin.com/groups/131222/131222-
> > > 6284423593917063169?midToken=AQGRDKND99GRHQ=eml-b2_
> > > anet_digest_of_digests-hero-11-discussion~subject&
> > > trkEmail=eml-b2_anet_digest_of_digests-hero-11-discussion~
> > > subject-null-uh2g~j4hb54j7~h2-null-communities~group~
> > >
> discussion=urn%3Ali%3Apage%3Aemail_b2_anet_digest_of_digests%
> > > 3BnYRsTix4QoG8YsVuU%2FryIg%3D%3D
> > >
> > > CC’ing dev@tika too.
> > >
> > > Cheers,
> > > Chris
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>


Re: Public datasets for Semantic Relationship Extraction

2017-06-29 Thread Tommaso Teofili
Hi Chris,

what do you mean specifically ? Leverage some of the works mentioned in the
papers or leverage datasets for specific tasks ? Or both ?
IIRC there was an OpenNLP page mentioning its usage for the bio NLP task,
not sure about the others.

Regards,
Tommaso


Il giorno mer 28 giu 2017 alle ore 20:58 Chris Mattmann 
ha scritto:

> Ahh here it is, sorry about that:
>
> https://github.com/davidsbatista/Annotated-Semantic-Relationships-Datasets
>
>
>
> On 6/28/17, 11:52 AM, "Suneel Marthi"  wrote:
>
> Forced me to join that group first - so will patiently wait for the
> group
> moderator to consider/rule out my application to join that group and
> then
> maybe I get to read that post. 
>
>
>
> On Wed, Jun 28, 2017 at 2:44 PM, Chris Mattmann 
> wrote:
>
> > Hi Team,
> >
> > Anything here that we can use in OpenNLP?
> >
> > https://www.linkedin.com/groups/131222/131222-
> > 6284423593917063169?midToken=AQGRDKND99GRHQ=eml-b2_
> > anet_digest_of_digests-hero-11-discussion~subject&
> > trkEmail=eml-b2_anet_digest_of_digests-hero-11-discussion~
> > subject-null-uh2g~j4hb54j7~h2-null-communities~group~
> > discussion=urn%3Ali%3Apage%3Aemail_b2_anet_digest_of_digests%
> > 3BnYRsTix4QoG8YsVuU%2FryIg%3D%3D
> >
> > CC’ing dev@tika too.
> >
> > Cheers,
> > Chris
> >
> >
> >
> >
> >
>
>
>
>


Re: [VOTE] Migrate our main repositories to GitHub

2017-06-28 Thread Tommaso Teofili
+1 to migrate to gitbox [1]

Regards,
Tommaso

[1] : https://gitbox.apache.org/

Il giorno mar 27 giu 2017 alle ore 21:54 Oleg Tikhonov  ha
scritto:

> [x] +1 Migrate all repositories to GitHub
>
>
>
> On Tue, Jun 27, 2017 at 10:48 PM, Chris Mattmann 
> wrote:
>
> > If you are talking about using Apache Gitbox, then yes I am +1 for this.
> >
> > Thanks,
> > Chris
> >
> >
> >
> >
> > On 6/27/17, 3:30 AM, "Joern Kottmann"  wrote:
> >
> > Hello all,
> >
> > lets decide here if we want to move our main repository, currently
> > hosted at Apache to GitHub instead. This will make our process a bit
> > easier because we can eliminate one remote from our workflow.
> >
> >  [ ] +1 Migrate all repositories to GitHub
> >  [ ] -1 Do not migrate,  because...
> >
> > Thanks,
> > Jörn
> >
> >
> >
> >
>


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-18 Thread Tommaso Teofili
+1 (binding)

Regards,
Tommaso

p.s.:

+1 also to Bruno's side comments

Il giorno gio 18 mag 2017 alle ore 12:43 Bruno P. Kinoshita
 ha scritto:

>
> [ X ] +1 Release the packages as Apache OpenNLP 1.8.0
>
> Not binding
>
> Side note: would be nice later to start fixing some issues found via
> FindBugs. Running `mvn clean findbugs:findbugs findbugs:gui` shows several
> errors, some seem important, like using equals() for array objects (which
> will always be false).
>
> See
>
>
> https://github.com/apache/opennlp/blob/73c8e5b9d8e055fefb53f7f3c2487d05c9788c6a/opennlp-tools/src/main/java/opennlp/tools/util/TokenTag.java#L85
>
> And
>
>
>
> https://github.com/apache/opennlp/blob/73c8e5b9d8e055fefb53f7f3c2487d05c9788c6a/opennlp-tools/src/main/java/opennlp/tools/util/featuregen/POSTaggerNameFeatureGenerator.java#L59
> Plus other NullPointerException's that can be prevented, and other minor
> issues. Not blockers for the release though, IMO.
>
> Cheers
> Bruno
>
>
> 
> From: Joern Kottmann 
> To: dev@opennlp.apache.org
> Sent: Thursday, 18 May 2017 9:49 AM
> Subject: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3
>
>
>
> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
>
> 1.8.0 Release Candidate 3.
>
>
> The RC 3 distributables can be downloaded from here:
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-101
>
> 3/org/apache/opennlp/opennlp-distr/1.8.0/
>
>
> The release was made from the Apache OpenNLP 1.8.0 tag at
>
> https://github.com/apache/opennlp/tree/opennlp-1.8.0
>
>
>
> To use it in a maven build set the version for opennlp-tools or
>
> opennlp-uima to 1.8.0 and add the following URL to your settings.xml
>
> file:
>
> https://repository.apache.org/content/repositories/orgapacheopennlp-101
>
> 3
>
>
>
> The release was made using the OpenNLP release process, documented on
>
> the Wiki here:
>
> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>
>
>
> The release contains quite some changes, please refer to the contained
>
> issue list for details.
>
>
>
> Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
>
> vote is open for at least the next 72 hours.
>
>
>
> Only votes from OpenNLP PMC are binding, but folks are welcome to check
>
> the release candidate and voice their approval or disapproval. The vote
>
> passes if at least three binding +1 votes are cast.
>
>
>
> [ ] +1 Release the packages as Apache OpenNLP 1.8.0
>
> [ ] -1 Do not release the packages because...
>
>
>
>
>
> Thanks!
>
>
> Jörn
>
>
> P.S. Here is my +1.
>


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-12 Thread Tommaso Teofili
+1 (binding)

- source distr build succeeds
- build from tag succeeds
- signatures and hashes ok

Regards,
Tommaso

Il giorno ven 12 mag 2017 alle ore 01:11 Suneel Marthi 
ha scritto:

> +1 binding
>
> 1. Downloaded artifacts and ran thru a clean build - all unit tests pass
> 2. verified sigs and hashes
>
> On Thu, May 11, 2017 at 9:37 AM, Joern Kottmann 
> wrote:
>
> > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> > 1.8.0 Release Candidate 2.
> >
> > The RC 2 distributables can be downloaded from here:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-101
> > 2/org/apache/opennlp/opennlp-distr/1.8.0/
> >
> > The release was made from the Apache OpenNLP 1.8.0 tag at
> > https://github.com/apache/opennlp/tree/opennlp-1.8.0
> >
> > To use it in a maven build set the version for opennlp-tools or
> > opennlp-uima to 1.8.0 and add the following URL to your settings.xml
> > file:
> > https://repository.apache.org/content/repositories/orgapacheopennlp-101
> > 2
> >
> > The release was made using the OpenNLP release process, documented on
> > the Wiki here:
> > https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
> >
> > The release contains quite some changes, please refer to the contained
> > issue list for details.
> >
> > Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
> > vote is open for at least the next 72 hours.
> >
> > Only votes from OpenNLP PMC are binding, but folks are welcome to check
> > the release candidate and voice their approval or disapproval. The vote
> > passes if at least three binding +1 votes are cast.
> >
> > [ ] +1 Release the packages as Apache OpenNLP 1.8.0
> > [ ] -1 Do not release the packages because...
> >
> >
> > Thanks!
> >
> > Jörn
> >
> > P.S. Here is my +1.
> >
>


Re: [VOTE] Apache OpenNLP 1.7.1 Release Candidate 1

2017-01-22 Thread Tommaso Teofili
+1

- checked sigs
- build ok
- license ok

Regards,
Tommaso


Il giorno dom 22 gen 2017 alle ore 18:18 Joern Kottmann 
ha scritto:

> On Sat, 2017-01-21 at 21:09 -0500, Jeffrey Zemerick wrote:
> > I went to the opennlp-distr/README for a summary of changes in 1.7.1
> > but I
> > think it is the same as it was for 1.7.0. Is that file typically
> > updated
> > for revision releases? The link at the bottom of the RELEASE_NOTES to
> > the
> > fixed JIRA issues is issuesFixed/jira-report.html. Minor stuff but
> > thought
> > I'd ask.
> >
> >
>
> Yes, this file should be updated. And usually we do this, just this
> time we didn't, I think we should release anyway and if we have to do
> RC 2 we can update it.
>
> There is also another minor thing with a test for maxent qn which is
> not configured correctly.
>
> Anyway, beside that, which will be perfect in 1.7.2 I didn't know of
> anything which would keep us from taking RC 1 for the 1.7.1 release, I
> will have a more detailed look at it now.
>
> Jörn
>


Re: Commit message style

2017-01-10 Thread Tommaso Teofili
+1

Tommaso

Il giorno mar 10 gen 2017 alle ore 11:20 Rodrigo Agerri 
ha scritto:

> +1 for the OPENNLP-xxx: commit message.
>
>
>
> On Tue, Jan 10, 2017 at 12:51 AM, William Colen 
> wrote:
>
> > +1 for the OPENNLP-xxx: commit message.
> > Fast to find a commit.
> >
> >
> > 2017-01-09 21:24 GMT-02:00 Joern Kottmann :
> >
> > > On Mon, 2017-01-09 at 17:02 -0500 <02%200500>, Jeffrey Zemerick wrote:
> > > > I'm personally a fan of the issue number being the first thing on the
> > > > subject line, like "OPENNLP-xxx: commit message." For me it gives a
> > > > consistent place to look for the issue without having to read the
> > > > full
> > > > message. (That way you can also see the issue number in GitHub's
> > > > commit
> > > > list without having to expand the commit.)
> > >
> > >
> > > Yes, it is also faster to write like that, on the other hand if the
> > > subject line is then too short to write something meaningful it is
> > > probably better to write it in the body instead.
> > >
> > > +1 to write it first thing in the subject line in all cases where it is
> > > possible, for very rare cases where it doesn't work it can still be in
> > > the body
> > >
> > > Jörn
> > >
> >
>


Re: OpenNLP 1.7.0 RC 2 is ready for testing

2017-01-01 Thread Tommaso Teofili
+1

Source build ok
Sigs ok
License & co ok
Il giorno dom 1 gen 2017 alle 03:02 Richard Eckart de Castilho <
r...@apache.org> ha scritto:

> On 01.01.2017, at 02:41, Suneel Marthi  wrote:
> >
> > The release has been finalized - please find the 1.7.0 release artifacts
> at
> > http://www.apache.org/dist/opennlp/opennlp-1.7.0/
>
> Hm, I only saw two binding votes instead of the usual three ones [1].
>
>   Jörn: +1
>   William: +1
>   Suneel: +1 (non-binding)
>
> Did I miss a vote?
>
> I also checked the mailing list archive for additional votes [2].
>
> Cheers,
>
> -- Richard
>
> [1] http://apache.org/foundation/voting.html
> [2]
> http://mail-archives.apache.org/mod_mbox/opennlp-dev/201612.mbox/thread


Re: Update to Java 8

2016-12-19 Thread Tommaso Teofili
+1

Tommaso

Il giorno lun 19 dic 2016 alle ore 22:27 ARUN Thundyill Saseendran <
ats0...@gmail.com> ha scritto:

> +1 to move to 1.8
>
> On Tue, Dec 20, 2016 at 2:51 AM, Suneel Marthi <
> suneel_mar...@yahoo.com.invalid> wrote:
>
> > +1 to move to Java 8
> >
> >
> >   From: Joern Kottmann 
> >  To: "dev@opennlp.apache.org" 
> >  Sent: Monday, December 19, 2016 8:45 AM
> >  Subject: Update to Java 8
> >
> > Hello all,
> >
> > Java 7 is already EOL.
> >
> > Should we update OpenNLP to Java 8 for the 1.7.0 release, any opinions?
> >
> > Jörn
> >
> >
>
>
>
>
> --
>


Re: Access to Git

2016-09-30 Thread Tommaso Teofili
when did you push them ? Another project I'm involved in had the very same
problem, after contacting infra@ and doing a trivial commit the mirror
sync'ed again.

Regards,
Tommaso

Il giorno ven 30 set 2016 alle ore 13:02 Rodrigo Agerri 
ha scritto:

> Hello,
>
> I have committed and push some stuff using the git repo, but it
> appears not in the github mirror
>
> https://github.com/apache/opennlp
>
> or in the svn repo
>
> http://svn.apache.org/viewvc/opennlp/trunk/
>
> it does however appear in the original git repo
>
> https://git-wip-us.apache.org/repos/asf?p=opennlp.git;a=summary
>
> Is this intentional?
>
> Cheers,
>
> Rodrigo
>
> On Mon, Sep 19, 2016 at 11:50 PM, Joern Kottmann 
> wrote:
> > The opennlp-addons repo is now also available, and opennlp-sandbox will
> > be available soon.
> >
> > Jörn
> >
> >
> > On Thu, 2016-09-15 at 01:12 +0200, Joern Kottmann wrote:
> >> Sorry, it took me a little to figure this out.
> >>
> >> This link explains how it works:
> >> https://reference.apache.org/committer/git
> >>
> >> > The reponame is opennlp, we will soon also have the other repos
> > opennlp-addons and opennlp-sandbox.
> >>
> >> Jörn
> >>
> >> > > On Fri, Sep 9, 2016 at 10:58 PM, Joern Kottmann  >
> > wrote:
> >> > > > Hello, yes you can use it. The add-ons and other things are not
> > setup yet as far as I know, have to ping the infra team about it.
> >> > Please have a look at the issue I posted to see how to access it.
> >> > I will work on this on Monday.
> >> > HTH
> >> >
> >> > Jörn
> >> >
> >> > > > > > On Sep 9, 2016 19:10, "William Colen" <
> william.co...@gmail.com>
> > wrote:
> >> > > Hello,
> >> > >
> >> > >
> >> > > Is the Git repository ready for use?
> >> > >
> >> > > Do we need to wait for it to develop new stuff?
> >> > >
> >> > >
> >> > >
> >> > > Thank you,
> >> > >
> >> > > William
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
>


Re: DeepLearning4J as a ML for OpenNLP

2016-06-28 Thread Tommaso Teofili
I had briefly looked into it a while ago, would be nice to collaborate
there.

Tommaso


Il giorno mar 28 giu 2016 alle 23:26 Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> ha scritto:

> Yep I think so - you may also look at SciSpark
> http://scispark.jpl.nasa.gov
> where we are using DL4J/ND4J and Breeze interchangeably here.
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
>
>
>
>
>
>
>
>
>
>
> On 6/28/16, 2:23 PM, "William Colen"  wrote:
>
> >Hi,
> >
> >Do you think it would be possible to implement a ML based on DL4J?
> >
> >http://deeplearning4j.org/
> >
> >Thank you
> >William
>


Re: OPENNLP-837

2016-03-11 Thread Tommaso Teofili
Hi Jeffrey,

thanks for your contribution.
I'll have a look at it and comment on the issue.

Regards,
Tommaso


Il giorno ven 11 mar 2016 alle ore 14:29 Jeffrey Zemerick <
jzemer...@apache.org> ha scritto:

> Hi all,
>
> I attached a patch to OPENNLP-837 (
> https://issues.apache.org/jira/browse/OPENNLP-837). With the patch, if the
> number of unique training events is zero a new exception
> (InsufficientTrainingDataException) is thrown. The model creation halts on
> the exception. As before, if you have any comments or want anything done
> differently please let me know!
>
> Thanks,
> Jeff
>


Re: Language Model contribution

2016-02-18 Thread Tommaso Teofili
Hi Jörn,

good you're ok with the LanguageModel API; currently the only existing
implementation is the NGramLanguageModel.
In order to create such a model you add ngrams to it as in NGramModel:

> LanguageModel languageModel = new NGramLanguageModel(*3*); // trigram
language model
> languageModel.add(new StringList(tokens), 1, *3*); // uni/bi/tri-grams
for tokenized text (StringList)

Once done with adding ngrams you can compute probability of a e.g. a
tokenized sentence with:

> double p = languageModel.calculateProbability(new StringList("neural",
"network", "language"));

Internally then it uses Laplace smoothing [1] for computing probabilities
if |ngrams| < 1M, otherwise it uses Stupid Backoff [2].
You can also use the LM to predict the next ngram given a sequence of
tokens (but that iterates over all the ngrams in order to find the most
probable and could be slow).

> StringList tokens = languageModel.predictNextTokens(new StringList(
"neural", "network", "language"));
> assertEquals(new StringList("models"), tokens);

One can quickly have a look at its usage by looking at the
NgramLanguageModelTest#testTrigramLanguageModelCreationFromText [3].

Hope this helps and of course if there're any additional questions, feel
free to ask.
Regards,
Tommaso

[1] : https://en.wikipedia.org/wiki/Additive_smoothing
[2] : http://www.aclweb.org/anthology/D07-1090.pdf
[3] :
https://github.com/apache/opennlp/blob/trunk/opennlp-tools/src/test/java/opennlp/tools/languagemodel/NgramLanguageModelTest.java#L131

Il giorno mer 17 feb 2016 alle ore 19:39 Joern Kottmann 
ha scritto:

> Ups, confused the language model you were working on with language
> detection.
> I think the interface is good as it is.
>
> Jörn
>
> On Wed, Feb 17, 2016 at 10:00 AM, Joern Kottmann 
> wrote:
>
> > Hello,
> >
> > I saw the language model commit. Thanks for contributing that!
> >
> > Would it be possible to get a short introduction to it?
> >
> > The interface is supposed to take a StringList. Wouldn't it be better if
> a
> > user can just pass in a String instead? Otherwise he has to worry about
> > tokenizing a string in a language he doesn't know. I think that should be
> > the task of the language detector.
> >
> > Can we come up with another name for the package? Maybe langid/langdetect
> > or something similar? Any opinions?
> >
> > The Model in LanguageModel we usually use to refer to machine learning
> > models, maybe we could rename this interface to LanguageDetector.
> >
> > Jörn
> >
>


Re: Are there plans to offer a Naive Bayesian Classifier in OpenNLP ?

2015-05-19 Thread Tommaso Teofili
Hi Cohan,

I think that'd be a very valuable contribution, as NB is one of the
foundation algorithms, often used as basis for comparisons.
It would be good if you could create a Jira issue and provide more details
about the implementation and, eventually, a patch.

Thanks and regards,
Tommaso

2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos cohan.su...@gmail.com:

 I have a question for the OpenNLP project team.

 I was wondering if there is a Naive Bayesian classifier implementation in
 OpenNLP that I've not come across, or if there are plans to implement one.

 If it is the latter, I should love to contribute an implementation.

 There is an ME classifier already available in OpenNLP, of course, but I
 felt that there was an unmet need for a Naive Bayesian (NB) classifier
 implementation to be offered as well.

 An NB classifier could be bootstrapped up with partially labelled training
 data as explained in the Nigam, McCallum, et al paper of 2000 Text
 Classification from Labeled and Unlabeled Documents using EM.

 So, if there isn't an NB code base out there already, I'd be happy to
 contribute a very solid implementation that we've used in production for a
 good 5 years.

 I'd have to adapt it to load the same training data format as the ME
 classifier, but I guess that shouldn't be very difficult to do.

 I was wondering if there was some interest in adding an NB implementation
 and I'd love to know who could I coordinate with if there is?

 Cohan Sujay Carlos
 CEO, Aiaioo Labs, India
 +91-77605-80015 +91-80-4125-0730



Re: svn commit: r1670574 - /opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java

2015-04-01 Thread Tommaso Teofili
I think you're right, I'll revert it and solve OPENNLP-764 as not a
problem, sorry for the noise.

Tommaso

2015-04-01 15:05 GMT+02:00 Joern Kottmann kottm...@gmail.com:

 The adaptive data is cleared in the documentDone method. The statement in
 the issue that it is not cleared is not true afaik.

 Jörn

 On Wed, Apr 1, 2015 at 9:47 AM, tomm...@apache.org wrote:

  Author: tommaso
  Date: Wed Apr  1 07:47:41 2015
  New Revision: 1670574
 
  URL: http://svn.apache.org/r1670574
  Log:
  OPENNLP-764 - applied patch from Pablo Duboue, clearing adaptive data
  after doc processing
 
  Modified:
 
 
 opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java
 
  Modified:
 
 opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java
  URL:
 
 http://svn.apache.org/viewvc/opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java?rev=1670574r1=1670573r2=1670574view=diff
 
 
 ==
  ---
 
 opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java
  (original)
  +++
 
 opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java
  Wed Apr  1 07:47:41 2015
  @@ -169,6 +169,8 @@ public final class NameFinder extends Ab
 documentConfidence.add(prob);
   }
 
  +mNameFinder.clearAdaptiveData();
  +
   return names;
 }
 
  @@ -210,4 +212,4 @@ public final class NameFinder extends Ab
 public void destroy() {
   mNameFinder = null;
 }
  -}
  \ No newline at end of file
  +}
 
 
 



Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-30 Thread Tommaso Teofili
thanks Jörn I'll follow your suggestion and change the test impl
accordingly.

Regards,
Tommaso

2015-01-29 13:16 GMT+01:00 Joern Kottmann kottm...@gmail.com:

 In those serialization tests I usually write the Object into a byte
 buffer, create it again from the byte buffer and then compare the two
 objects, instead of the binary representation.

 Could that solve the problem we have in this test?

 Jörn

 On Thu, 2015-01-29 at 12:11 +0100, Tommaso Teofili wrote:
  I've just disabled that test, I'll fix it and re-enable it when done.
 
  Regards,
  Tommaso
 
  2015-01-29 10:51 GMT+01:00 Joern Kottmann kottm...@gmail.com:
 
   It still fails in the assert. I didn't check but I guess the build
   server has the same problem.
  
   Jörn
  
   On Thu, 2015-01-29 at 10:25 +0100, Tommaso Teofili wrote:
even after my latest commit? If so I'll rearrange the test a bit.
   
Tommaso
   
2015-01-29 10:21 GMT+01:00 Joern Kottmann kottm...@gmail.com:
   
 Or if that is a problem for the test, you could also tell RAT to
 ignore
 it.

 On my machine the test fails. The two strings don't match.

 Jörn

 On Thu, 2015-01-29 at 09:59 +0100, Tommaso Teofili wrote:
  right, thanks I'll fix both.
 
  Tommaso
 
  2015-01-29 9:54 GMT+01:00 Joern Kottmann kottm...@gmail.com:
 
   This file should have an AL header.
  
   Jörn
  
   On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote:
Added:
   
  

  
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
URL:
   
  

  
 http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml?rev=1655546view=auto
   
  

  
 ==
---
   
  

  
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
   (added)
+++
   
  

  
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
   Thu Jan 29 08:02:31 2015
@@ -0,0 +1,58 @@
+?xml version=1.0 encoding=UTF-8?
+dictionary case_sensitive=false
+entry count=1
+tokenbrown/token
+tokenfox/token
+/entry
  
  
  



  
  
  





Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Tommaso Teofili
even after my latest commit? If so I'll rearrange the test a bit.

Tommaso

2015-01-29 10:21 GMT+01:00 Joern Kottmann kottm...@gmail.com:

 Or if that is a problem for the test, you could also tell RAT to ignore
 it.

 On my machine the test fails. The two strings don't match.

 Jörn

 On Thu, 2015-01-29 at 09:59 +0100, Tommaso Teofili wrote:
  right, thanks I'll fix both.
 
  Tommaso
 
  2015-01-29 9:54 GMT+01:00 Joern Kottmann kottm...@gmail.com:
 
   This file should have an AL header.
  
   Jörn
  
   On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote:
Added:
   
  
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
URL:
   
  
 http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml?rev=1655546view=auto
   
  
 ==
---
   
  
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
   (added)
+++
   
  
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
   Thu Jan 29 08:02:31 2015
@@ -0,0 +1,58 @@
+?xml version=1.0 encoding=UTF-8?
+dictionary case_sensitive=false
+entry count=1
+tokenbrown/token
+tokenfox/token
+/entry
  
  
  





Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Tommaso Teofili
right, thanks I'll fix both.

Tommaso

2015-01-29 9:54 GMT+01:00 Joern Kottmann kottm...@gmail.com:

 This file should have an AL header.

 Jörn

 On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote:
  Added:
 
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
  URL:
 
 http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml?rev=1655546view=auto
 
 ==
  ---
 
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
 (added)
  +++
 
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
 Thu Jan 29 08:02:31 2015
  @@ -0,0 +1,58 @@
  +?xml version=1.0 encoding=UTF-8?
  +dictionary case_sensitive=false
  +entry count=1
  +tokenbrown/token
  +tokenfox/token
  +/entry





Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Tommaso Teofili
I've just disabled that test, I'll fix it and re-enable it when done.

Regards,
Tommaso

2015-01-29 10:51 GMT+01:00 Joern Kottmann kottm...@gmail.com:

 It still fails in the assert. I didn't check but I guess the build
 server has the same problem.

 Jörn

 On Thu, 2015-01-29 at 10:25 +0100, Tommaso Teofili wrote:
  even after my latest commit? If so I'll rearrange the test a bit.
 
  Tommaso
 
  2015-01-29 10:21 GMT+01:00 Joern Kottmann kottm...@gmail.com:
 
   Or if that is a problem for the test, you could also tell RAT to ignore
   it.
  
   On my machine the test fails. The two strings don't match.
  
   Jörn
  
   On Thu, 2015-01-29 at 09:59 +0100, Tommaso Teofili wrote:
right, thanks I'll fix both.
   
Tommaso
   
2015-01-29 9:54 GMT+01:00 Joern Kottmann kottm...@gmail.com:
   
 This file should have an AL header.

 Jörn

 On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote:
  Added:
 

  
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
  URL:
 

  
 http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml?rev=1655546view=auto
 

  
 ==
  ---
 

  
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
 (added)
  +++
 

  
 opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml
 Thu Jan 29 08:02:31 2015
  @@ -0,0 +1,58 @@
  +?xml version=1.0 encoding=UTF-8?
  +dictionary case_sensitive=false
  +entry count=1
  +tokenbrown/token
  +tokenfox/token
  +/entry



  
  
  





Re: Text Summarization module?

2015-01-22 Thread Tommaso Teofili
Hi Ram,

since your proposal got positive feedback, maybe you could create an issue
in Jira and attach the code / patch for discussion / review.

Regards,
Tommaso


Re: Word Sense Disambiguation

2015-01-19 Thread Tommaso Teofili
+1

Tommaso

2015-01-19 19:10 GMT+01:00 Joern Kottmann kottm...@gmail.com:

 Hello,

 +1 from me to just go ahead and implement the proposed approach. One
 goal of this implementation will be to figure out the interface we want
 to have in OpenNLP for WSD.

 We can later extend OpenNLP with more implementations which are taking
 different approaches.

 Jörn

 On Thu, 2015-01-15 at 16:50 +0900, Anthony Beylerian wrote:
  Hello,
 
  I'm new here, I previously mentioned to Jörn about my colleagues and
 myself being interested in helping to implement this component, we were
 thinking of starting with simple knowledge based approaches, although they
 do not yield high accuracy, but as a first step they are relatively simple,
 would like your opinion.
 
  Pei also mentioned cTAKES (
 http://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-wsd/ currently very
 exploratory stages here) and YTEX (
 https://code.google.com/p/ytex/wiki/WordSenseDisambiguation_V08) is also
 just exploring WSD for the healthcare domain. It's also currently
 knowledge/ontology base for now... It would be great to see if OpenNLP
 supports a general domain WSD
 
  Best,
 
  Anthony
 





Re: Build changed opennlp/pom.xml moved to root directory

2014-11-20 Thread Tommaso Teofili
IMHO it was about time, thanks Jörn :-)

Regards,
Tommaso

2014-11-20 21:11 GMT+01:00 Joern Kottmann kottm...@gmail.com:

 Hello everybody,

 we changed the structure of the project slightly. The main pom.xml used
 to be located in opennlp/pom.xml. This was done because an Eclipse
 workspace can't have files at the root level. The Maven convention is to
 have the file at the root level. I think it is time to move this file to
 the root directory to not anymore confuse Maven users (and maybe some
 tools) which expect the file in the root directory.

 Please let me know if there are any objections to this.

 To build OpenNLP from now on just go the trunk directory and type mvn
 install.

 Jörn




Re: What should we do with the SF models?

2014-10-29 Thread Tommaso Teofili
In my opinion the long term goal should be to work on training new, Apache2
licensed, ones and make them available to our users; it probably make sense
to take the SF models offline in any case because as long as they are there
people will keep downloading and using them, as that's just much easier
than training new ones.
As a short term goal I agree we should give more visibility to instructions
on how to train new models using existing corpora.

My 2 cents,
Tommaso


2014-10-28 20:37 GMT+01:00 Gustavo Knuppe gustavoknu...@gmail.com:

 I believe that models are important for users, since not every user has
 access to appropriate data files to train basic models.

 My suggestion is to use an alternative service to host these models,
 like github, torrent or other file share service...

 Github is a good option since they don't have any quota or bandwidth
 limitation.

 Gustvo K.

 2014-10-28 15:19 GMT-02:00 Joern Kottmann kottm...@gmail.com:

  Hi all,
 
  OpenNLP always came with a couple of trained models which were ready to
  use for a few languages. The performance a user encounters with those
  models heavily depends on their input text.
 
  Especially the English name finder models which were trained on MUC 6/7
  data perform very poorly these days if run on current news articles and
  even worse on data which is not in the news domain.
 
  Anyway, we often get judged on how well OpenNLP works just based on the
  performance of those models (or maybe people who compare their NLP
  systems against OpenNLP just love to have OpenNLP perform badly).
 
  I think we are now at a point with those models were it is questionable
  if having them is still an advantage for OpenNLP. The SourceForge page
  is often blocked due to traffic limitations. We definitely have to act
  somehow.
 
  The old models have definitely some historic value and are used for
  testing the release.
 
  What should we do?
 
  We could take them offline and advice our users to train their own
  models on one of the various corpora we support. We could also do both
  and place a prominent link to our corpora documentation on the download
  page and in a less visible place a link to he historic SF models.
 
  Jörn
 
 



Re: Parsing with PCFGs

2014-10-18 Thread Tommaso Teofili
Ok, no problem.
In the meantime I've added the first PCFG implementation in the sandbox,
see http://svn.apache.org/r1632735

Regards,
Tommaso

2014-10-16 11:33 GMT+02:00 Rodrigo Agerri rage...@apache.org:

 Hello!

 No, unfortunately not :)

 Cheers,

 Rodrigo

 On Thu, Oct 16, 2014 at 9:20 AM, Tommaso Teofili
 tommaso.teof...@gmail.com wrote:
  Hi Rodrigo,
 
  thanks a lot for your inputs, do you have insights on the treeinsert
  algorithm [1] too?
 
  Thanks,
  Tommaso
 
  [1] :
 
 http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.parsing
 
  2014-10-15 9:38 GMT+02:00 Rodrigo Agerri rage...@apache.org:
 
  Hi,
 
  The main algorithm (called chunking in the trunk) is based on
  Ratnapharki's work.
  It is best to directly read the paper.
 
  http://link.springer.com/article/10.1023/A:1007502103375
 
  This is a shift-reduced parser which incidentally are becoming quite
  fashionable again. For example, Stanford CoreNLP recently released a
  shift-reduced parser themselves, as an alternative to their PCFGs,
  lexicalized parser.
 
  HTH,
 
  Rodrigo
 
  On Wed, Oct 15, 2014 at 9:32 AM, Tommaso Teofili
  tommaso.teof...@gmail.com wrote:
   Hi all,
  
   in a bit of spare time I sketched a basic implementation of (in
 memory)
   probabilistic context free grammars which, if properly trained, can be
  used
   to build the parse tree of a given sentence, however (also looking at
 the
   doc on the website) it's not completely clear what's already
 implemented
  in
   trunk, I see there are 2 algorithms for parsing, could someone shed
 some
   light on them? And eventually fire an opinion for adding PCFGs as an
   additional algorithm?
  
   Regards,
   Tommaso
 



Re: Parsing with PCFGs

2014-10-16 Thread Tommaso Teofili
Hi Rodrigo,

thanks a lot for your inputs, do you have insights on the treeinsert
algorithm [1] too?

Thanks,
Tommaso

[1] :
http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.parsing

2014-10-15 9:38 GMT+02:00 Rodrigo Agerri rage...@apache.org:

 Hi,

 The main algorithm (called chunking in the trunk) is based on
 Ratnapharki's work.
 It is best to directly read the paper.

 http://link.springer.com/article/10.1023/A:1007502103375

 This is a shift-reduced parser which incidentally are becoming quite
 fashionable again. For example, Stanford CoreNLP recently released a
 shift-reduced parser themselves, as an alternative to their PCFGs,
 lexicalized parser.

 HTH,

 Rodrigo

 On Wed, Oct 15, 2014 at 9:32 AM, Tommaso Teofili
 tommaso.teof...@gmail.com wrote:
  Hi all,
 
  in a bit of spare time I sketched a basic implementation of (in memory)
  probabilistic context free grammars which, if properly trained, can be
 used
  to build the parse tree of a given sentence, however (also looking at the
  doc on the website) it's not completely clear what's already implemented
 in
  trunk, I see there are 2 algorithms for parsing, could someone shed some
  light on them? And eventually fire an opinion for adding PCFGs as an
  additional algorithm?
 
  Regards,
  Tommaso



Parsing with PCFGs

2014-10-15 Thread Tommaso Teofili
Hi all,

in a bit of spare time I sketched a basic implementation of (in memory)
probabilistic context free grammars which, if properly trained, can be used
to build the parse tree of a given sentence, however (also looking at the
doc on the website) it's not completely clear what's already implemented in
trunk, I see there are 2 algorithms for parsing, could someone shed some
light on them? And eventually fire an opinion for adding PCFGs as an
additional algorithm?

Regards,
Tommaso


Re: Pluggable Machine Learning support

2013-05-31 Thread Tommaso Teofili
big +1!

Tommaso


2013/5/31 William Colen william.co...@gmail.com

 I don't see any issue. People that uses Maxent directly would need to
 change how they use it, but that is OK for a major release.




 On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann kottm...@gmail.com wrote:

  Are there any objections to move the maxent/perceptron classes to an
  opennlp.tools.ml
  package as part of this issue? Moving the things would avoid a second
  interface layer and
  probably make using OpenNLP Tools a bit easier, because then we are down
  to a single jar.
 
  Jörn
 
 
  On 05/30/2013 08:57 PM, William Colen wrote:
 
  +1 to add pluggable machine learning algorithms
  +1 to improve the API and remove deprecated methods in 1.6.0
 
  You can assign related Jira issues to me and I will be glad to help.
 
 
  On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann kottm...@gmail.com
  wrote:
 
   Hi all,
 
  we spoke about it here and there already, to ensure that OpenNLP can
 stay
  competitive with other NLP libraries I am proposing to make the machine
  learning pluggable.
 
  The extensions should not make it harder to use OpenNLP, if a user
 loads
  a
  model OpenNLP should be capable of setting up everything by itself
  without
  forcing the user to write custom integration code based on the ml
  implementation.
  We solved this problem already with the extension mechanism, we build
 to
  support the customization of our components, I suggest that we reuse
 this
  extension mechanism to load a ml implementation. To use a custom ml
  implementation the user has to specify the class name of the factory in
  the
  Algorithm field of the params file. The params file is available during
  training and tagging time.
 
  Most components in the tools package use the maxent library to do
  classification. The Java interfaces for this are currently located in
 the
  maxent package, to be able to swap the implementation the interfaces
  should
  be defined inside the tools package. To make things easier I propose to
  move the maxent and perceptron implemention as well.
 
  Through the code base we use the AbstractModel, thats a bit unlucky
  because the only reason for this is the lack of model serialization
  support
  in the MaxentModel interface, a serialization method should be added to
  it,
  and maybe renamed to ClassificationModel. This will
  break backward compatibility in non-standard use cases.
 
  To be able to test the extension mechanism I suggest that we implement
 an
  addon which integrates liblinear and the Apache Mahout classifiers.
 
  There are still a few deprecated 1.4 constructors and methods in
 OpenNLP
  which directly reference interfaces and classes in the maxent library,
  these need to be removed, to be able to move the interfaces to the
 tools
  package.
 
  Any opinions?
 
  Jörn
 
 
 



Re: How to make RC2?

2013-03-08 Thread Tommaso Teofili
Hi Jorn,

as far as I know, if you already run mvn release:perform you have to do
that manually (too bad I know), otherwise if you've run release:prepare
then you could safely run mvn release:rollback and have everything as
before.

Regards,
Tommaso



2013/3/8 Jörn Kottmann kottm...@gmail.com

 Hi all,

 the question came up on how the RC1 we just did can be rolled back to make
 RC2.

 To do that the versions in the poms need all to be changed back to
 1.5.3-SNAPSHOT respectively 3.0.3-SNAPSHOT.
 I always did that manually in eclipse with the search and replace tool.

 The maven release plugins rollback can only be used after the prepare
 stage, but not after release:perform.

 HTH,
 Jörn



Re: Migrate to Git?

2012-12-20 Thread Tommaso Teofili
in my opinion that would be good, +1
Tommaso


2012/12/19 Jörn Kottmann kottm...@gmail.com

 Hi all,

 I heard at ApacheCon Europe that it should be possible to migrate from
 Subverion to Git.

 Is there any interest in doing that? If we decide to do it I suggest to
 wait until the
 1.5.3 release is done so we have a bit time to also migrate our build
 process.

 Do have all committers experience with git?

 Jörn



Re: how to train with CorpusServer

2012-06-21 Thread Tommaso Teofili
Thanks Jörn, I'll just try the tools until the guide is ready.
Tommaso

2012/6/13 Jörn Kottmann kottm...@gmail.com

 Hello,

 no there no such guide or tutorial yet.
 I will write a getting started page for it and at it to our wiki.
 But the tools are there and now also much easier to use, because
 some required components can now be used in their released versions.

 Jörn


 On 06/12/2012 02:30 PM, Tommaso Teofili wrote:

 Hi all,
 reading [1] and back to OPENNLP-385 [2] I wonder if a guide/wiki on how to
 use the CorpusServer (eventually with UIMA CAS Editor) to train data
 exists
 or it'd be possible to create one.
 I remember there was a more generic wiki page [3] but perhaps a step by
 step guide could be useful too.
 Regards,
 Tommaso

 [1] : 
 https://issues.apache.org/**jira/browse/LUCENE-2899https://issues.apache.org/jira/browse/LUCENE-2899
 [2] : 
 https://issues.apache.org/**jira/browse/OPENNLP-385https://issues.apache.org/jira/browse/OPENNLP-385
 [3] : 
 https://cwiki.apache.org/**OPENNLP/opennlp-annotations.**htmlhttps://cwiki.apache.org/OPENNLP/opennlp-annotations.html