Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-18 Thread Dan Russ
My (not so thorough, but probably different than yours) tests passed. I
vote:

+1

On Thu, May 18, 2017 at 1:42 PM, Jeff Zemerick  wrote:

> +1 non-binding
>
> Built and tested a token name finder model on Ubuntu 16.04 and Amazon
> Linux 2017.03.0
> with OpenJDK8.
>
>
> On Thu, May 18, 2017 at 11:17 AM, Joern Kottmann 
> wrote:
>
> > @Richard, it would be nice if you could vote as well so we know that what
> > we have now in RC 3 works for you.
> >
> > Jörn
> >
> > On Thu, May 18, 2017 at 4:56 PM, William Colen 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Successfully executed complete evaluation tests in source deliverable.
> > > Tried it with DKPro and after updating the Lemmatizer and Chunker usage
> > > there was two test failures that we could trace back to issues fixed in
> > > OPENNLP-125 and OPENNLP-989 that would affect evaluation results.
> > >
> > >
> > >
> > > 2017-05-18 10:08 GMT-03:00 Tommaso Teofili  >:
> > >
> > > > +1 (binding)
> > > >
> > > > Regards,
> > > > Tommaso
> > > >
> > > > p.s.:
> > > >
> > > > +1 also to Bruno's side comments
> > > >
> > > > Il giorno gio 18 mag 2017 alle ore 12:43 Bruno P. Kinoshita
> > > >  ha scritto:
> > > >
> > > > >
> > > > > [ X ] +1 Release the packages as Apache OpenNLP 1.8.0
> > > > >
> > > > > Not binding
> > > > >
> > > > > Side note: would be nice later to start fixing some issues found
> via
> > > > > FindBugs. Running `mvn clean findbugs:findbugs findbugs:gui` shows
> > > > several
> > > > > errors, some seem important, like using equals() for array objects
> > > (which
> > > > > will always be false).
> > > > >
> > > > > See
> > > > >
> > > > >
> > > > > https://github.com/apache/opennlp/blob/
> > 73c8e5b9d8e055fefb53f7f3c2487d
> > > > 05c9788c6a/opennlp-tools/src/main/java/opennlp/tools/util/
> > > > TokenTag.java#L85
> > > > >
> > > > > And
> > > > >
> > > > >
> > > > >
> > > > > https://github.com/apache/opennlp/blob/
> > 73c8e5b9d8e055fefb53f7f3c2487d
> > > > 05c9788c6a/opennlp-tools/src/main/java/opennlp/tools/util/
> featuregen/
> > > > POSTaggerNameFeatureGenerator.java#L59
> > > > > Plus other NullPointerException's that can be prevented, and other
> > > minor
> > > > > issues. Not blockers for the release though, IMO.
> > > > >
> > > > > Cheers
> > > > > Bruno
> > > > >
> > > > >
> > > > > 
> > > > > From: Joern Kottmann 
> > > > > To: dev@opennlp.apache.org
> > > > > Sent: Thursday, 18 May 2017 9:49 AM
> > > > > Subject: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3
> > > > >
> > > > >
> > > > >
> > > > > The Apache OpenNLP PMC would like to call for a Vote on Apache
> > OpenNLP
> > > > >
> > > > > 1.8.0 Release Candidate 3.
> > > > >
> > > > >
> > > > > The RC 3 distributables can be downloaded from here:
> > > > >
> > > > > https://repository.apache.org/content/repositories/
> > > orgapacheopennlp-101
> > > > >
> > > > > 3/org/apache/opennlp/opennlp-distr/1.8.0/
> > > > >
> > > > >
> > > > > The release was made from the Apache OpenNLP 1.8.0 tag at
> > > > >
> > > > > https://github.com/apache/opennlp/tree/opennlp-1.8.0
> > > > >
> > > > >
> > > > >
> > > > > To use it in a maven build set the version for opennlp-tools or
> > > > >
> > > > > opennlp-uima to 1.8.0 and add the following URL to your
> settings.xml
> > > > >
> > > > > file:
> > > > >
> > > > > https://repository.apache.org/content/repositories/
> > > orgapacheopennlp-101
> > > > >
> > > > > 3
> > > > >
> > > > >
> > > > >
> > > > > The release was made using the OpenNLP release process, documented
> on
> > > > >
> > > > > the Wiki here:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/OPENNLP/
> Release+Process
> > > > >
> > > > >
> > > > >
> > > > > The release contains quite some changes, please refer to the
> > contained
> > > > >
> > > > > issue list for details.
> > > > >
> > > > >
> > > > >
> > > > > Please vote on releasing these packages as Apache OpenNLP 1.8.0.
> The
> > > > >
> > > > > vote is open for at least the next 72 hours.
> > > > >
> > > > >
> > > > >
> > > > > Only votes from OpenNLP PMC are binding, but folks are welcome to
> > check
> > > > >
> > > > > the release candidate and voice their approval or disapproval. The
> > vote
> > > > >
> > > > > passes if at least three binding +1 votes are cast.
> > > > >
> > > > >
> > > > >
> > > > > [ ] +1 Release the packages as Apache OpenNLP 1.8.0
> > > > >
> > > > > [ ] -1 Do not release the packages because...
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > >
> > > > > Jörn
> > > > >
> > > > >
> > > > > P.S. Here is my +1.
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Migrate our main repositories to GitHub

2017-06-27 Thread Dan Russ
+1

> On Jun 27, 2017, at 9:28 AM, William Colen  wrote:
> 
> +1
> 
> 
> 2017-06-27 9:35 GMT-03:00 Suneel Marthi :
> 
>> +1
>> 
>> मेरे iPhone से प्रेषित
>> 
>> २७/०६/२०१७ को पू ८:२२ पर Jeff Zemerick  ने लिखा :
>> 
>>> +1
>>> 
>>> On Tue, Jun 27, 2017 at 6:53 AM, Rodrigo Agerri 
>>> wrote:
>>> 
 +1
 
 R
 
> On Tue, Jun 27, 2017 at 12:46 PM, Mark G 
>> wrote:
> 
> +1
> 
> Sent from my iPhone
> 
>> On Jun 27, 2017, at 6:30 AM, Joern Kottmann 
 wrote:
>> 
>> +1
>> 
>> Jörn
>> 
>>> On Tue, Jun 27, 2017 at 12:30 PM, Joern Kottmann >> 
> wrote:
>>> Hello all,
>>> 
>>> lets decide here if we want to move our main repository, currently
>>> hosted at Apache to GitHub instead. This will make our process a bit
>>> easier because we can eliminate one remote from our workflow.
>>> 
>>>   [ ] +1 Migrate all repositories to GitHub
>>>   [ ] -1 Do not migrate,  because...
>>> 
>>> Thanks,
>>> Jörn
> 
 
>> 



Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-27 Thread Dan Russ
Hi All,
   First, let me take a share of blame for the comment Chris mentioned.  I 
believe I said something like the pull request was X revision behind and Y 
revisions ahead.  It was not meant to be rude, it was meant to say it is hard 
to review code when it is so different from the current code base. I am very 
excited that sentiment analysis is going to be added to OpenNLP, but I have not 
had time to play with it. If I were to say “great job” before I have add a 
chance to look at it, it would be flattery not honest praise.

  Let’s clean up the merge.  I agree with Chris that scalability and perfection 
should not be our initial goal.  Let’s get something, and we can decide how to 
optimize later (even if it require a complete rewrite).  Perfection is the 
enemy of the good.

  Finally, because of Chris’ comments it is hard to thank Ana and Chris without 
sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get 
beyond this and that Chris and Ana will continue to improve the performance of 
the sentiment analysis tool and happily remain part of the OpenNLP family.  It 
is also a good time to toss a big thank you to all of the committers, users, 
and PMC member.  I use OpenNLP almost everyday.  Your work is extremely 
valuable to me.

Thank you,
Daniel

> On Jun 27, 2017, at 10:25 AM, Chris Mattmann  wrote:
> 
> Hi everyone,
> 
> I spoke with Joern in Slack. Some of his concerns are:
> 
> 1. This was done with a Merge commit and apparently they squash and rebase. 
> [would be helpful to see some pointer on this for documentation, thus far I 
> haven’t found any]
> 2. Apparently we literally need to ask others for +1 votes and record them 
> before committing? I thought since Ana and I are committers aren were +1, 
> and since Joern had been providing feedback (the last of which was to add
> tests, which we did) that he would be +1 as well (I guess he is not, and I 
> guess
> formally we need to do a +1 vote even still)
> 3. There was concern about scalability of the code.
> 4. There are thoughts that the code was not perfect yet (even though it works
> fine in the MEMEX project for Ana and I)
> 
> So, Joern has opened up a revert PR. 
> 
> I suppose I should state I find this process extremely heavyweight and 
> unwelcoming.
> To me, there should be a modicum of trust for committers, but I feel like 
> even as a 
> committer, I am operating as a “contributor” to the project. Committer means 
> that
> there is trust to modify the source code base. Of the issues above, the only 
> one I see
> as a moderate snafu was #1, and frankly if there are some instructions that 
> show me
> how to do squashing and rebasing *first* I will try to do that in the future 
> since I am
> not a GIt expert. 
> 
> That said, I must state I feel pretty put off by Apache OpenNLP. This 
> originated as a GSoC 
> effort, and we have worked pretty consistently on this over the last year. We 
> used a
> separate GitHub project to get started, kept Joern involved as another 
> mentor, even
> provided access and commit writes to that GitHub repository for a long time, 
> so this
> code was developed in the open. Joern even created a branch in ApacheOpenNLP 
> in the code and I suppose
> I should have gone and worked on that branch first since master is apparently 
> so 
> pristine that even an Apache veteran like me can’t get something in to it 
> without 
> making a whole bunch of (what are IMO minor issues, and what are IMO 
> heavyweight
> “community” issues). 
> 
> I am concerned from a community point of view that the first comment wasn’t 
> “Great
> job Chris, you got Sentiment Analysis into Apache, *but* I have these 
> concerns 1-4 above”.
> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
> 
> That’s pretty off-putting to someone who is semi-new like me and like Ana.
> 
> Anyways, go ahead and revert it. Sorry to have caused any issues. 
> 
> Chris
> 
> 
> 
> On 6/27/17, 7:06 AM, "Chris Mattmann"  wrote:
> 
>Hi Joern,
> 
>I’m confused. Why did you revert my commit?
> 
>Every one of those check points you put on the PR was checked?
>We have been discussing this for months, you have seen the 
>code for months, Ana and I have worked diligently on the code
>in plain view of everyone.
> 
>Please explain.
> 
>Chris
> 
> 
> 
> 
>On 6/27/17, 1:23 AM, "kottmann"  wrote:
> 
>GitHub user kottmann opened a pull request:
> 
>https://github.com/apache/opennlp/pull/238
> 
>Revert merging of sentiment work, no consent to merge it
> 
>Thank you for contributing to Apache OpenNLP.
> 
>In order to streamline the review of the contribution we ask you
>to ensure the following steps have been taken:
> 
>### For all changes:
>- [ ] Is there a JIRA ticket associated with this PR? Is it 
> referenced 
> in the commit message?
> 
>- [ ] Does your P

Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 3

2017-07-07 Thread Dan Russ
+1 binding

> On Jul 7, 2017, at 1:26 PM, Jeff Zemerick  wrote:
> 
> +1 non-binding
> 
> Builds and tests
> High mem tests pass
> Eval tests pass
> 
> 
> 
> 
> 
> On Fri, Jul 7, 2017 at 11:13 AM, Richard Eckart de Castilho 
> wrote:
> 
>> Hi all,
>> 
>> 
>>> On 05.07.2017, at 15:21, Suneel Marthi  wrote:
>>> 
>>> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
>> 1.8.1
>>> Release Candidate 3.
>> 
>> I ran a DKPro Core build against the new RC3. All shiny :)
>> Thanks for the new version!
>> 
>> +1 from me (non-binding and only based on external unit-testing,
>>not reviewing artifacts).
>> 
>> -- Richard
>> 
>> 



Re: Dictionary

2017-07-21 Thread Dan Russ
Hi Manoj,
The format has been around for a long time.  Whereas I don’t think it 
predates XML, XML was probably not as ubiquitous as it is today.  However, it 
really should not be a stumbling point for you.  I believe all you need to do 
is read in the data and get the spans of the names.  One other point, OpenNLP 
has the concept of a dictionary.  Have you looked into 
openlp.tools.dictionary.Dictionary and 
openlp.tools.dictionary.DictionarySerializer?  It looks like you want to create 
a DictionarySerializer that can read your format.

   One last point, This question is probably better asked on the user 
listserve.  Most of the developers are subscribed to both the user and dev 
listserves.

Hope it helps,
Daniel


> On Jul 21, 2017, at 6:54 AM, Manoj B. Narayanan 
>  wrote:
> 
> Hi Jim,
> Thanks for replying. Could you be more specific please.
> 
> These are the things that I am aware of:
> 1. The training data can be of the form   Pierre Vinken 
> is a good example .
> 2. Currently I use a file in the below format and create a 'Dictionary'
> from it.
>This is the format
> 
> vinayak
>> 
>> rakesh
>> 
>> sandeep
>> 
>> manoj
>> 
>> 
> And use this dictionary in the DictionaryNameFinder.
> 
> I would like to know the advantages of using this format. Is there any
> other formats available?
> 
> Could you please explain more.
> 
> Thanks.
> Manoj
> 
> On Fri, Jul 21, 2017 at 3:56 PM, Jim O'Regan  wrote:
> 
>> 2017-07-19 10:48 GMT+01:00 Manoj B. Narayanan <
>> manojb.narayanan2...@gmail.com>:
>> 
>>> Hi all,
>>> 
>>> I wanted to find out if there is any specific reason behind using XML
>>> format for dictionaries for Name Finder.
>>> 
>> 
>> It's not XML. There is a very superficial similarity in the use of <>, but,
>> at a minimum
>>  Pierre Vinken 
>> would need to be something like
>>  Pierre Vinken 
>> and the whole document would need to be enclosed by a pair of tags.
>> 
>> 
>>> Also, is there any source from where we can get the documentation
>> regarding
>>> the dictionary formats for various tools (tokenizer, pos, name finder).
>>> 
>> 
>> The manual: https://opennlp.apache.org/docs/1.8.1/manual/opennlp.html
>> More specifically,
>> tokeniser:
>> https://opennlp.apache.org/docs/1.8.1/manual/opennlp.
>> html#tools.tokenizer.training
>> pos:
>> https://opennlp.apache.org/docs/1.8.1/manual/opennlp.
>> html#tools.postagger.training
>> name finder:
>> https://opennlp.apache.org/docs/1.8.1/manual/opennlp.
>> html#tools.namefind.training
>> 



Re: Request for DEV Mailing List.

2017-07-23 Thread Dan Russ
Hi Chaitanya,
Go to
http://opennlp.apache.org/mailing-lists.html it has instructions on how to
sign up for the dev and user mail lists.
Daniel


On Jul 22, 2017 11:55 AM, "Chaitanya Karmarkar" <
chaitanya.karmarkar...@gmail.com> wrote:

Hello,

To whom it may concern.

I would like to keep myself up to date about the progress on OpenNLP
project.

Thanks
Chaitanya Karmarkar


Re: How can I find pos train data of opennlp

2017-08-04 Thread Dan Russ
VK,
   Please send this to us...@openlp.apache.org  
not the dev list.  
Thanks
Daniel

> On Aug 3, 2017, at 10:01 PM, VK  wrote:
> 
> Hello, I'm using your opennlp to anlyze words in sentence. It's cool and
> work nicely.
> But some word, opennlp tag inappropriate part of speech. So I want to add
> sentence with appropriate POS.
> How can I file pos train data file of opennlp to append my sentence ?
> Thank you very much.



Re: Parser vs Chunker

2017-08-18 Thread Dan Russ
Hello Manoj,

This is more a job for Wikipedia than opennlp’s dev mail list.  
https://en.wikipedia.org/wiki/Parsing   
https://en.wikipedia.org/wiki/Shallow_parsing 


Essentially, the term “parsing” is a generic term that takes input text and 
breaks it up into parts using certain rules (wikipedia refers to this as a 
grammar).  Think of java's Integer.parseInt(String s).  OpenNLP has a 
StringTokenizer that “parses” strings into constituent words based on 
whitespace (WhitespaceTokenizer) or a statistically training model 
(TokenizerME).  A Chunker on the other hand takes the constituent words and 
puts them together to make a larger construct (think of a phrase).  So….  If 
you want to get noun or verb phrases use a chunker.  It is also very useful if 
you are interested in identifying relationships between words.  I believe the 
Stanford NLP dependencies use chunking, for more info on that 
https://nlp.stanford.edu/software/stanford-dependencies.shtml#English 
 .  If I 
am wrong about the Stanford Dependencies, maybe someone will correct me...


Hope it helps,
Daniel

   

> On Aug 18, 2017, at 9:50 AM, Manoj B. Narayanan 
>  wrote:
> 
> Hi,
> 
> Could someone help me with this please ?
> 
> Thanks,
> Manoj.
> 
> On Tue, Aug 8, 2017 at 1:16 PM, Manoj B. Narayanan <
> manojb.narayanan2...@gmail.com> wrote:
> 
>> Hi,
>> 
>> Can some one please explain the difference between Parser and Chunker in
>> OpenNLP.
>> I think we can get the same output of the Parser from Chunker output
>> itself.
>> Please correct me if I am wrong.
>> 
>> Thanks.
>> Manoj.
>> 



Re: Early stopping NameFinderME

2017-08-24 Thread Dan Russ
can't you set the number of iterations in the training properties

On Aug 24, 2017 4:48 AM, "Joern Kottmann"  wrote:

> You are the first one who ever asked this question. I think we have this as
> an option already on the gis trainer but it is not exposed all the way
> through.
>
> Please open a jira and I can look at it next week.
>
> Jörn
>
> On Aug 21, 2017 5:11 PM, "Saurabh Jain"  wrote:
>
> > Hi All
> >
> > How can we use early stopping while training/crossvalidating custom data
> > with NameFinder ? What I want if change in likelihood value or accuracy
> of
> > model is less than 0.05 between two steps (differ by 5 i.e compare x+5
> step
> > output with x step) then training should stop. I could not find anything
> > regarding this in documentation. Can some one please help ?
> >
> > --
> > *Thanks & Regards*
> >
> >
> > *Saurabh Jain *
> > *AI Developer*
> >
> > *Active Intelligence  *
> >
> > *"*
> > *To do a thing yesterday was the best time . Second best time is today
> .” *
> >
>


Re: Early stopping NameFinderME

2017-08-29 Thread Dan Russ
Hi Jörn,
   I don’t see a problem with it.  Make sure the default is set to the current 
value.  Are you making the fix?  I could get to it later tonight.
Daniel

> On Aug 29, 2017, at 10:32 AM, Joern Kottmann  wrote:
> 
> Hi Daniel,
> 
> do you see any issue if we expose LLThreshold and allow the user to
> change it via training parameters?
> 
> Jörn
> 
> On Sat, Aug 26, 2017 at 1:07 AM, Daniel Russ  wrote:
>> Jörn,
>> 
>>   Currently, GISTrainer has a private static final variable LLThreshold, 
>> which controls if the change in the log likelihood between two iterations is 
>> too small.  We could make this parameter. I am concerned about using the 
>> accuracy to train the model.  If we use accuracy, the weight space may be 
>> flat.
>> 
>>   Saurabh, you use the term “early stopping”.  In deep learning, early 
>> stopping is used to prevent overtraining and improve generalization to 
>> unseen data.  I am not sure early stopping serves the same purpose with GIS 
>> training.  Does anyone know if early stopping improves generalization for a 
>> maxent problem?
>> 
>> Daniel
>> 
>>> On Aug 24, 2017, at 4:48 AM, Joern Kottmann  wrote:
>>> 
>>> You are the first one who ever asked this question. I think we have this as
>>> an option already on the gis trainer but it is not exposed all the way
>>> through.
>>> 
>>> Please open a jira and I can look at it next week.
>>> 
>>> Jörn
>>> 
>>> On Aug 21, 2017 5:11 PM, "Saurabh Jain"  wrote:
>>> 
 Hi All
 
 How can we use early stopping while training/crossvalidating custom data
 with NameFinder ? What I want if change in likelihood value or accuracy of
 model is less than 0.05 between two steps (differ by 5 i.e compare x+5 step
 output with x step) then training should stop. I could not find anything
 regarding this in documentation. Can some one please help ?
 
 --
 *Thanks & Regards*
 
 
 *Saurabh Jain *
 *AI Developer*
 
 *Active Intelligence  *
 
 *"*
 *To do a thing yesterday was the best time . Second best time is today .” *
 
>> 



Re: [VOTE] Apache OpenNLP 1.8.3 Release Candidate

2017-10-25 Thread Dan Russ
+1 burrito

ran units test on my downstream code that uses opennlp-tools.

> On Oct 25, 2017, at 6:58 AM, Suneel Marthi  wrote:
> 
> +1 binding
> 
> 1. Verified Sigs and hashes
> 2. Ran a clean build from {src} * {zip, tar}
> 3. All unit tests pass
> 
> On Wed, Oct 25, 2017 at 3:08 PM, Bruno P. Kinoshita <
> brunodepau...@yahoo.com.br.invalid> wrote:
> 
>> [ X ] +1 Release the packages as Apache OpenNLP 1.8.3
>> 
>> `mvn clean test install` working fine, checked artefacts signatures,
>> matching with what was in the vote e-mail.
>> 
>> Currently on tag 1.8.3, commit b317159cb9857dc509c08a31a98dc61209f39bff
>> 
>> Thanks for preparing this release.
>> 
>> Cheers
>> Bruno
>> 
>> 
>> 
>> 
>> From: Suneel Marthi 
>> To: dev@opennlp.apache.org; us...@opennlp.apache.org
>> Sent: Tuesday, 24 October 2017 10:29 PM
>> Subject: [VOTE] Apache OpenNLP 1.8.3 Release Candidate
>> 
>> 
>> 
>> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
>> 
>> 1.8.3 Release Candidate.
>> 
>> 
>> The Release artifacts can be downloaded from:
>> 
>> 
>> https://repository.apache.org/content/repositories/orgapache
>> 
>> opennlp-1010/org/apache/opennlp/opennlp-distr/1.7.2/
>> 
>> 
>> The release was made from the Apache OpenNLP 1.8.3 tag at
>> 
>> 
>> https://github.com/apache/opennlp/tree/opennlp-1.8.3
>> 
>> 
>> To use it in a maven build set the version for opennlp-tools or
>> opennlp-uima
>> 
>> to 1.8.3
>> 
>> 
>> and add the following URL to your settings.xml file:
>> 
>> 
>> https://repository.apache.org/content/repositories/
>> orgapacheopennlp-1019/org/apache/opennlp/opennlp-distr/1.8.3/
>> 
>> 
>> The artifacts have been signed with the Key - D3541808 found at
>> 
>> 
>> http://people.apache.org/keys/group/opennlp.asc
>> 
>> 
>> Please vote on releasing these packages as Apache OpenNLP 1.8.3. The vote
>> is
>> 
>> 
>> open for either the next 72 hours or a minimum of 3 +1 PMC binding votes
>> 
>> whichever happens earlier.
>> 
>> 
>> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
>> 
>> 
>> release candidate and voice their approval or disapproval. The vote passes
>> 
>> 
>> if at least three binding +1 votes are cast.
>> 
>> 
>> [ ] +1 Release the packages as Apache OpenNLP 1.8.3
>> 
>> 
>> [ ] -1 Do not release the packages because...
>> 
>> 
>> Thanks again to all the committers and contributors for their work
>> 
>> over the past
>> 
>> few weeks.
>> 



Re: [VOTE] Apache OpenNLP 1.8.4 Release Candidate

2017-12-21 Thread Dan Russ
[ X] +1 Release the packages as Apache OpenNLP 1.8.4

> On Dec 21, 2017, at 9:44 AM, Jeff Zemerick  wrote:
> 
> Hi Folks,
> 
> I have posted a first release candidate for the Apache OpenNLP 1.8.4
> release and it is ready for testing.
> 
> The RC1 distributables can be downloaded from here:
> https://repository.apache.org/content/repositories/orgapacheopennlp-1020/org/apache/opennlp/opennlp-distr/1.8.4
> 
> The release was made from the Apache OpenNLP 1.8.4 tag at
> https://github.com/apache/opennlp/tree/opennlp-1.8.4
> 
> To use it in a maven build set the version for opennlp-tools or
> opennlp-uima to 1.8.4 and add the following URL to your settings.xml file:
> https://repository.apache.org/content/repositories/orgapacheopennlp-1020
> 
> The release was made using the OpenNLP release process, documented on the
> Wiki here:
> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
> 
> The release contains quite some changes, please refer to the contained
> issue list for details.
> 
> Please vote on releasing these packages as Apache OpenNLP 1.8.4. The vote is
> open for at least the next 72 hours.
> 
> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
> release candidate and voice their approval or disapproval. The vote passes
> if at least three binding +1 votes are cast.
> 
> [ ] +1 Release the packages as Apache OpenNLP 
> [ ] -1 Do not release the packages because...
> 
> Thanks!
> Jeff Zemerick



Re: Model file

2018-01-18 Thread Dan Russ
Have you looked at either the GISBinaryModelWriter or Reader?  It’s fairly 
simple, something like 


For models trained with the GISTrainer…

GIS
# of outcomes
 
# of predictors
Predictor-1 # of outcomes for this predictor; 
outcome-1-id,weight;outcome-2,weight;…outcome-n,weight;
Predictor-2 # of outcomes for this predictor; 
outcome-1-id,weight;outcome-2,weight;…outcome-n,weight;
…
Predictor-z # of outcomes for this predictor; 
outcome-1-id,weight;outcome-2,weight;…outcome-n,weight;

Model trained with other trainers are similar, but with slight variations.  I 
think the QNTrainer starts with “QN” instead of GIS and the predictors/outcomes 
are reversed.

I’m doing this from memory, so it may be slightly different.  But this logic 
AND the source code should get you started.

For you are looking at BaseModels, e.g. POSModel, SentenceDetectorModel, the 
format is a little more complicated and you will need to look at the code.  
These models have more than just a maxent model, but associated code to make 
results what you expect.

Hope it helps.
Dan

> On Jan 18, 2018, at 8:23 AM, Manoj B. Narayanan 
>  wrote:
> 
> Hi all,
> 
> Just curious to know what the content of the *.bin* file is. How are the
> probabilities of the features calculated and how are they used for
> prediction ?
> 
> I believe it will make my understanding better. Kindly guide me.
> 
> Thanks,
> Manoj.



Re: NER Features

2018-01-30 Thread Dan Russ
Damiano,
   You are treading in some dangerous waters.  You need to open up the black 
box of the model and peek inside.  I open up the NER model and get the maxent 
model inside.  I make the context using the feature generator, then I get the 
context map inside the maxent model to lookup the weights.  I am grossly 
oversimplifying the process because otherwise I would be writing a treatise on 
OpenNLP.  I would suggest looking at the find() method and align what that 
method does with my comments on the steps you need to take.

Hope it helps...
Daniel

> On Jan 30, 2018, at 12:10 PM, Damiano Porta  wrote:
> 
> Hello everybody,
> 
> how can we understand what are the most important features during the NER
> process? I mean.. when the TokenNameFinder selects a label is it possible
> to retrieve the most important features too ?
> 
> Thanks
> Damiano



Re: OpenNLP Spanish models

2018-09-13 Thread Dan Russ
Hello,
   Have you tried the old 1.5 source forge models available at 
http://opennlp.sourceforge.net/models-1.5/ 
  ?  The Spanish models start with 
es-.
Daniel



> On Sep 13, 2018, at 5:03 AM, Elsa Cerezo Fernández  
> wrote:
> 
> Hello,
> 
> 
> 
> I do not know if you can help me:
> 
> 
> 
> I am using the OpenNLP library v1.8.4 as a NuGet in VisualStudio and the
> only Spanish models I have found are from previous versions of the library.
> Even so, I have used them and they do not work for me, it fails to
> initialize the TokenizerModel object. I have used the model in English v1.5
> and if it works correctly.
> 
> 
> 
> If there are no Spanish models available, could you tell me how to create
> one? I have seen that .train files are used with data for the training but
> I do not know the structure that they should have or anything.
> 
> 
> 
> Thanks in advance!
> 
> 
> 
> E. Cerezo



Re: OpenNLP Spanish models

2018-09-13 Thread Dan Russ
Sorry,  The tokenizer model is not there.  Have you can training a Spanish 
tokenizer using the universal dependency data?
Daniel

> On Sep 13, 2018, at 12:03 PM, Dan Russ  wrote:
> 
> Hello,
>Have you tried the old 1.5 source forge models available at 
> http://opennlp.sourceforge.net/models-1.5/ 
> <http://opennlp.sourceforge.net/models-1.5/>  ?  The Spanish models start 
> with es-.
> Daniel
> 
> 
> 
>> On Sep 13, 2018, at 5:03 AM, Elsa Cerezo Fernández > <mailto:elsacere...@gmail.com>> wrote:
>> 
>> Hello,
>> 
>> 
>> 
>> I do not know if you can help me:
>> 
>> 
>> 
>> I am using the OpenNLP library v1.8.4 as a NuGet in VisualStudio and the
>> only Spanish models I have found are from previous versions of the library.
>> Even so, I have used them and they do not work for me, it fails to
>> initialize the TokenizerModel object. I have used the model in English v1.5
>> and if it works correctly.
>> 
>> 
>> 
>> If there are no Spanish models available, could you tell me how to create
>> one? I have seen that .train files are used with data for the training but
>> I do not know the structure that they should have or anything.
>> 
>> 
>> 
>> Thanks in advance!
>> 
>> 
>> 
>> E. Cerezo
>