Re: BRAT

2018-02-01 Thread Daniel Russ
I don’t know of any wiki with info specifically using BRAT data.  However, 
first read:

https://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#tools.namefind.training.tool
 

   

Then issue the command:
opennlp TokenNameFinderTrainer.brat
The help documentation should be sufficient.  Let us know if you have a 
specific question after that.  Again, this should be posted to 
us...@opennlp.apache.org  not the dev.  Other 
people may have a similar question.  I use the BRAT annotator and train models 
using the data.  It works very well for token finding.

Daniel



On Feb 1, 2018 5:51 AM, "Damiano Porta" mailto:damianopo...@gmail.com>> wrote:
Hello everybody,
is there a wiki to understand how to use BRAT with OpenNLP for annotations?
Thank you!

Damiano



Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate 2

2017-09-12 Thread Daniel Russ
+1

Daniel

> On Sep 12, 2017, at 2:37 AM, Suneel Marthi  wrote:
> 
> +1 binding
> 
> On Tue, Sep 12, 2017 at 8:10 AM, Tommaso Teofili 
> wrote:
> 
>> +1
>> 
>> Tommaso
>> 
>> Il giorno lun 11 set 2017 alle ore 09:12 Joern Kottmann <
>> kottm...@gmail.com>
>> ha scritto:
>> 
>>> Hi Folks,
>>> 
>>> 
>>> I have posted a second release candidate for the Apache OpenNLP 1.8.2
>>> release and it is ready for testing.
>>> 
>>> 
>>> The RC 2 distributables can be downloaded from here:
>>> 
>>> https://repository.apache.org/content/repositories/
>> orgapacheopennlp-1018/org/apache/opennlp/opennlp-distr/1.8.2/
>>> 
>>> 
>>> The release was made from the Apache OpenNLP 1.8.2 tag at
>>> https://github.com/apache/opennlp/tree/opennlp-1.8.2
>>> 
>>> 
>>> To use it in a maven build set the version for opennlp-tools or
>>> opennlp-uima to 1.8.2 and add the following URL to your settings.xml
>>> file:
>>> https://repository.apache.org/content/repositories/orgapacheopennlp-1018
>>> 
>>> The release was made using the OpenNLP release process, documented on
>>> the Wiki here:
>>> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>>> 
>>> The release contains quite some changes, please refer to the contained
>>> issue list for details.
>>> 
>>> 
>>> Please vote on releasing these packages as Apache OpenNLP 1.8.2. The vote
>>> is
>>> open for at least the next 72 hours.
>>> 
>>> 
>>> Only votes from OpenNLP PMC are binding, but folks are welcome to check
>> the
>>> release candidate and voice their approval or disapproval. The vote
>> passes
>>> if at least three binding +1 votes are cast.
>>> 
>>> 
>>> [ ] +1 Release the packages as Apache OpenNLP 1.8.2
>>> [ ] -1 Do not release the packages because...
>>> 
>>> 
>>> Thanks!
>>> 
>>> Jörn
>>> 
>>> P.S. Here is my +1.
>>> 
>> 



Re: Cache

2017-09-05 Thread Daniel Russ
Again, you should send this to users not dev mail list.

Have you tried adding an instance variable  (e.g.  numWords) that you update 
when you call “createFeature”? You need to be concerned with thread safety if 
you do this on more than 1 thread, but you can synchronize only the part of the 
code that adds to the instance variable.  You should also be careful to create 
1 instance of the AdaptiveFeatureGenerator.   

you might also consider resetting the value in the clearAdaptiveData method.  
However, that might not work for you and you may need a method to reset your 
feature.

Let me know if that works for you….

Daniel

> On Sep 5, 2017, at 6:48 AM, Manoj B. Narayanan 
>  wrote:
> 
> Hi,
> 
> Could anyone please help me out.
> 
> Thanks,
> Manoj
> 
> On Wed, Aug 30, 2017 at 10:47 AM, Manoj B. Narayanan <
> manojb.narayanan2...@gmail.com> wrote:
> 
>> Hi,
>> 
>> While training a NER model I use provide custom features by implementing 
>> *AdaptiveFeatureGenerator.
>> *In the implementing class I compute certain features and add them. When
>> I have another implementing class for adding features, I compute the
>> features which sometimes are computed in the first implementing class
>> itself. I am not aware of a mechanism where I can use the information from
>> the first implementing class without computing them again.
>> 
>> So, if there is a way where I could store some information about a
>> sentence in a global manner, I would be able to get them in all the classes
>> I need without computing them.
>> 
>> If there is a provision for this already, please guide me. Else, I put
>> forth this as a suggestion/request.
>> 
>> Thanks,
>> Manoj.
>> 
>> 
>> 



Re: DictionaryNameFinder

2017-09-05 Thread Daniel Russ
Hi Manoj,
   Please send your question to the users list, not the dev list.

   I believe the dictionaryNameFinder is passed a dictionary of names and if a 
name appears in the dictionary, it is marked as found.  Otherwise, no name is 
found.  It is not a statistical model.  The two methods you describe are 
similar (but I won’t promise they are exactly the same).  I would use the 
DictionaryNameFinder, because I trust that it is implemented well, but if your 
code is faster and you trust it go with it.

Daniel

> On Sep 1, 2017, at 8:56 AM, Manoj B. Narayanan 
>  wrote:
> 
> Hi,
> 
> Can someone please explain how the DictionaryNameFinder works.
> 
> What will be the difference between
> 
>   1.  DictionaryNameFinder
>   2.  Maintaining custom lists in code and performing String comparisons.
> 
> Is there any computational (time/storage) advantage using one over the
> other?
> 
> Please guide me.
> 
> Thanks.



Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate

2017-09-05 Thread Daniel Russ
+1 binding

(Thank Jörn and Suneel for the help)
Daniel

> On Sep 4, 2017, at 11:08 PM, Suneel Marthi  wrote:
> 
> +1 binding
> 
> On Mon, Sep 4, 2017 at 5:41 PM, Joern Kottmann  wrote:
> 
>> Hi Folks,
>> 
>> 
>> I have posted a first release candidate for the Apache OpenNLP 1.8.2
>> release and it is ready for testing.
>> 
>> 
>> The RC 1 distributables can be downloaded from here:
>> https://repository.apache.org/content/repositories/
>> orgapacheopennlp-1017/org/apache/opennlp/opennlp-distr/1.8.2/
>> 
>> 
>> The release was made from the Apache OpenNLP 1.8.2 tag at
>> https://github.com/apache/opennlp/tree/opennlp-1.8.2
>> 
>> 
>> To use it in a maven build set the version for opennlp-tools or
>> opennlp-uima to 1.8.2 and add the following URL to your settings.xml
>> file:
>> https://repository.apache.org/content/repositories/orgapacheopennlp-1017
>> 
>> The release was made using the OpenNLP release process, documented on
>> the Wiki here:
>> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>> 
>> The release contains quite some changes, please refer to the contained
>> issue list for details.
>> 
>> 
>> Please vote on releasing these packages as Apache OpenNLP 1.8.2. The vote
>> is
>> open for at least the next 72 hours.
>> 
>> 
>> Only votes from OpenNLP PMC are binding, but folks are welcome to check the
>> release candidate and voice their approval or disapproval. The vote passes
>> if at least three binding +1 votes are cast.
>> 
>> 
>> [ ] +1 Release the packages as Apache OpenNLP 1.8.2
>> [ ] -1 Do not release the packages because...
>> 
>> 
>> Thanks!
>> 
>> Jörn
>> 
>> P.S. Here is my +1.
>> 



signature.asc
Description: Message signed with OpenPGP


Re: Early stopping NameFinderME

2017-08-25 Thread Daniel Russ
Jörn,

   Currently, GISTrainer has a private static final variable LLThreshold, which 
controls if the change in the log likelihood between two iterations is too 
small.  We could make this parameter. I am concerned about using the accuracy 
to train the model.  If we use accuracy, the weight space may be flat.  

   Saurabh, you use the term “early stopping”.  In deep learning, early 
stopping is used to prevent overtraining and improve generalization to unseen 
data.  I am not sure early stopping serves the same purpose with GIS training.  
Does anyone know if early stopping improves generalization for a maxent problem?

Daniel

> On Aug 24, 2017, at 4:48 AM, Joern Kottmann  wrote:
> 
> You are the first one who ever asked this question. I think we have this as
> an option already on the gis trainer but it is not exposed all the way
> through.
> 
> Please open a jira and I can look at it next week.
> 
> Jörn
> 
> On Aug 21, 2017 5:11 PM, "Saurabh Jain"  wrote:
> 
>> Hi All
>> 
>> How can we use early stopping while training/crossvalidating custom data
>> with NameFinder ? What I want if change in likelihood value or accuracy of
>> model is less than 0.05 between two steps (differ by 5 i.e compare x+5 step
>> output with x step) then training should stop. I could not find anything
>> regarding this in documentation. Can some one please help ?
>> 
>> --
>> *Thanks & Regards*
>> 
>> 
>> *Saurabh Jain *
>> *AI Developer*
>> 
>> *Active Intelligence  *
>> 
>> *"*
>> *To do a thing yesterday was the best time . Second best time is today .” *
>> 



Re: Spelling correction

2017-07-01 Thread Daniel Russ
Damiano,

There is a lot of research on spelling correction.  Here is a paper from a 
group our of the National Library of Medicine
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2137159/ 
.   They also have a 
product called GSpell 
https://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/gSpell/current/GSpell.html 
 
which uses the NLM lexicon.  It might not work of OpenNLP (too english-based) 
but things to look into.  I dabble into the spelling correction field, but have 
not worked serious in it.  I’d be willing to help on this project, but i don’t 
have a lot of time.

Daniel


> On Jul 1, 2017, at 7:20 PM, Suneel Marthi  wrote:
> 
> u could also leverage Language Models for spell correction, OpenNLP has
> stupid-backoff implementation - create a language model with that algorithm
> and use that for spell checks.
> 
> On Sat, Jul 1, 2017 at 2:43 PM, Damiano Porta 
> wrote:
> 
>> I also read about Noisy channel. I could work on this if you think it is
>> good.
>> 
>> Damiano
>> 
>> Il 1 lug 2017 20:16, "Suneel Marthi"  ha scritto:
>> 
>>> 'Spelling Correction' has been the most popular ask from audience at my
>>> recent NLP talks, it would be great to have this feature in OpenNLP.
>>> 
>>> I am not aware of any papers on this, but the first thing that comes to
>>> mind and is irrelevant is the 'Noisy channel'.
>>> 
>>> 
>>> 
>>> On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta 
>>> wrote:
>>> 
 Hello everybody,
 i am dealing with data normalization on very bad sentences with many
 spelling errors.
 
 Do you know a good paper to understand how to build a model that will
>> fix
 this kind of problem?
 I can share the code without problems if you are interested in
>>> integrating
 it into OpenNLP.
 
 Thanks
 Damiano
 
>>> 
>> 



Re: [GitHub] opennlp pull request #231: Adding sentiment analysis code to OpenNLP: OPENNL...

2017-06-15 Thread Daniel Russ
Hi,
   I tried to take a look at the pull request.  But it is 1677 commits behind 
apache:master.  Can you please rebase your code.
Thank you.
Daniel

> On Jun 15, 2017, at 1:19 PM, amensiko  wrote:
> 
> GitHub user amensiko opened a pull request:
> 
>https://github.com/apache/opennlp/pull/231
> 
>Adding sentiment analysis code to OpenNLP: OPENNLP-840
> 
> 
> 
> 
> You can merge this pull request into a Git repository by running:
> 
>$ git pull https://github.com/amensiko/opennlp OPENNLP-840-2
> 
> Alternatively you can review and apply these changes as the patch at:
> 
>https://github.com/apache/opennlp/pull/231.patch
> 
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
> 
>This closes #231
> 
> 
> commit 21fbd8bd15387725e546ed950f7e9cb49cd3d840
> Author: Menshikova 
> Date:   2017-06-15T16:59:19Z
> 
>Adding sentiment analysis code to OpenNLP: OPENNLP-840
> 
> 
> 
> 
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---



Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-12 Thread Daniel Russ
+1 binding
Completed my evaluation on my external code.  All tests passed.


> On May 12, 2017, at 11:51 AM, William Colen  wrote:
> 
> +1 binding
> Executed the complete evaluation suite, both in source distribution and the
> git tag. Integrated and tested with other tools.
> 
> 
> 2017-05-12 9:48 GMT-03:00 Joern Kottmann :
> 
>> The vote is still open and we won't close it before the entire active PMC
>> voted or the time passed.
>> 
>> Jörn
>> 
>> On Fri, May 12, 2017 at 2:29 PM, Daniel Russ  wrote:
>> 
>>> Even though we have enough binding votes to release, can I have a few
>> hours
>>> to complete testing of my code with 1.8.0RC2 before release.
>>> Daniel
>>> 
>>> On May 11, 2017 12:38 PM, "Joern Kottmann"  wrote:
>>> 
>>>> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
>>>> 1.8.0 Release Candidate 2.
>>>> 
>>>> The RC 2 distributables can be downloaded from here:
>>>> https://repository.apache.org/content/repositories/
>> orgapacheopennlp-101
>>>> 2/org/apache/opennlp/opennlp-distr/1.8.0/
>>>> 
>>>> The release was made from the Apache OpenNLP 1.8.0 tag at
>>>> https://github.com/apache/opennlp/tree/opennlp-1.8.0
>>>> 
>>>> To use it in a maven build set the version for opennlp-tools or
>>>> opennlp-uima to 1.8.0 and add the following URL to your settings.xml
>>>> file:
>>>> https://repository.apache.org/content/repositories/
>> orgapacheopennlp-101
>>>> 2
>>>> 
>>>> The release was made using the OpenNLP release process, documented on
>>>> the Wiki here:
>>>> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>>>> 
>>>> The release contains quite some changes, please refer to the contained
>>>> issue list for details.
>>>> 
>>>> Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
>>>> vote is open for at least the next 72 hours.
>>>> 
>>>> Only votes from OpenNLP PMC are binding, but folks are welcome to check
>>>> the release candidate and voice their approval or disapproval. The vote
>>>> passes if at least three binding +1 votes are cast.
>>>> 
>>>> [ ] +1 Release the packages as Apache OpenNLP 1.8.0
>>>> [ ] -1 Do not release the packages because...
>>>> 
>>>> 
>>>> Thanks!
>>>> 
>>>> Jörn
>>>> 
>>>> P.S. Here is my +1.
>>>> 
>>> 
>> 



Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-12 Thread Daniel Russ
Even though we have enough binding votes to release, can I have a few hours
to complete testing of my code with 1.8.0RC2 before release.
Daniel

On May 11, 2017 12:38 PM, "Joern Kottmann"  wrote:

> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> 1.8.0 Release Candidate 2.
>
> The RC 2 distributables can be downloaded from here:
> https://repository.apache.org/content/repositories/orgapacheopennlp-101
> 2/org/apache/opennlp/opennlp-distr/1.8.0/
>
> The release was made from the Apache OpenNLP 1.8.0 tag at
> https://github.com/apache/opennlp/tree/opennlp-1.8.0
>
> To use it in a maven build set the version for opennlp-tools or
> opennlp-uima to 1.8.0 and add the following URL to your settings.xml
> file:
> https://repository.apache.org/content/repositories/orgapacheopennlp-101
> 2
>
> The release was made using the OpenNLP release process, documented on
> the Wiki here:
> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>
> The release contains quite some changes, please refer to the contained
> issue list for details.
>
> Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
> vote is open for at least the next 72 hours.
>
> Only votes from OpenNLP PMC are binding, but folks are welcome to check
> the release candidate and voice their approval or disapproval. The vote
> passes if at least three binding +1 votes are cast.
>
> [ ] +1 Release the packages as Apache OpenNLP 1.8.0
> [ ] -1 Do not release the packages because...
>
>
> Thanks!
>
> Jörn
>
> P.S. Here is my +1.
>


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate

2017-05-11 Thread Daniel Russ
-1 because of DictionaryLemmatizer bug OPENNLP-1056
Daniel

> On May 9, 2017, at 2:41 PM, Joern Kottmann  wrote:
> 
> The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP
> 1.8.0 Release Candidate 1. 
> 
> The RC 1 distributables can be downloaded from here:
> https://repository.apache.org/content/repositories/orgapacheopennlp-101
> 1/org/apache/opennlp/opennlp-distr/1.8.0/
> 
> The release was made from the Apache OpenNLP 1.8.0 tag at
> https://github.com/apache/opennlp/tree/opennlp-1.8.0
>  
> To use it in a maven build set the version for opennlp-tools or
> opennlp-uima to 1.8.0 and add the following URL to your settings.xml
> file:
> https://repository.apache.org/content/repositories/orgapacheopennlp-101
> 1
>  
> The release was made using the OpenNLP release process, documented on
> the Wiki here:
> https://cwiki.apache.org/confluence/display/OPENNLP/Release+Process
>  
> The release contains quite some changes, please refer to the contained
> issue list for details.
>  
> Please vote on releasing these packages as Apache OpenNLP 1.8.0. The
> vote is open for at least the next 72 hours.
>  
> Only votes from OpenNLP PMC are binding, but folks are welcome to check
> the release candidate and voice their approval or disapproval. The vote
> passes if at least three binding +1 votes are cast.
>  
> [ ] +1 Release the packages as Apache OpenNLP 
> [ ] -1 Do not release the packages because...
>  
>  
> Thanks!
> 
> Jörn
> 
> P.S. Here is my +1.



Re: Problem in passing feature generator for NameFinderCrossValidation

2017-04-21 Thread Daniel Russ
Hi Saurabh,
   I am a little confused why you need a byte[].  Can't you do this:

1.  split your data into 5-folds. (it doesn’t have to be 5, but it is a more 
concrete example)
2.  train on 4 folds. test on 1. (run 5 times changing the test set)
3.  look at the average agreement.

I am a little different than most.  My final model would use ALL the data.  I 
x-validate to get an idea of robustness, accuracy, and variance.
Daniel

> On Apr 21, 2017, at 9:17 AM, Saurabh Jain  wrote:
> 
> Hi All
> 
> I have defined feature generator for OpenNLP name finder in java source
> code as an object of *CachedFeatureGenerator *. I have to cross validate
> NameFinder and whatever api I am able to find in code accepts feature
> generators as byte array. Problem is  *CachedFeatureGenerator *is not
> serializable (as far as I came to know). Is there any api in OpenNLP
> NameFinder for cross validation which accept *CachedFeatureGenerator *as
> feature generator or is there any other way ?
> 
> -- 
> *Thanks & Regards*
> 
> 
> *Saurabh Jain *
> *AI Developer*
> 
> *Active Intelligence  *
> 
> *"*
> *To do a thing yesterday was the best time . Second best time is today .” *