Re: Requesting for subscription

2016-03-07 Thread Mattmann, Chris A (3980)
Hi Thirupathi,

I believe that indeed the below can be done in OpenNLP. Can you
check the website and/or wiki page and identify if it meets
your needs?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





-Original Message-
From: Thirupathi Nerella 
Reply-To: "dev@opennlp.apache.org" 
Date: Monday, February 29, 2016 at 10:53 PM
To: "dev@opennlp.apache.org" 
Subject: Requesting for subscription

>Hi,
>
>Can we do the following in OPENLLP,
>
>Stemming and Lemmatization,
>Resolving Synonyms,
>Word sense disambiguation,
>Negativity exclusion,
>Eliminate punctuation and case sensitivity,
>Detection of duplicates
>
>Please response quickly.
>
>Thanks in advance
>
>-- 
>
>
>*Thanks & Regards,Thirupathi Nerella,*
>
>*Cell : +91-9652647069,*
>
>*Software Engineer at ATMECS Technologies Pvt Ltd,*
>*Lanco Hills, Hyderabad.*



Re: Question about deprecated NameFinderME constructors

2016-03-07 Thread Cohan Sujay Carlos
Dear Rodrigo,

Thank you for the informative reply.

I just wanted to say I feel there is a use-case that the new constructor
still does not support.  Let me explain with an example.

Let's first take the example of brown-feature.xml, which is defined as ...


  

  

  
  

  

  


... In this feature generator, I believe "window" maps to the
WindowFeatureGenerator

and "token" maps to TokenFeatureGenerator

.

It's clear that we can create new feature generators that are combinations
of existing feature generators.

However, let's say I have a task / language where none of the existing
feature generators or combinations work very well.

Say, for example, that I want to create a new feature generator that pulls
out morphemes from agglutinative South Indian languages ... let's call it
"AgglutinativeSouthIndianLanguageMorphologicalSuffixFeatureGenerator".

It's not clear how one could create XML tags for this feature generator
using the new constructor.

The same thing is easy to do programmatically using the old constructors ->
I would just extend the AdaptiveFeatureGenerator

.

So, I was wondering ... are we giving up some API flexibility and
simplicity by removing the constructors that enable me to use subclasses of
AdaptiveFeatureGenerator

while
there is no easy way to create something like a
AgglutinativeSouthIndianLanguageMorphologicalSuffixFeatureGenerator and use
it as a feature generator in the NameFinderME using the new constructor's
XML specification.

Cohan Sujay Carlos
Aiaioo Labs, +91-77605-80015, http://www.aiaioo.com

On Mon, Mar 7, 2016 at 4:37 PM, Rodrigo Agerri  wrote:

> Hi,
>
> You can do all those tasks by using the create method in the
> TokenNameFinderFactory:
>
>
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/TokenNameFinderFactory.java?revision=1712553&view=markup#l100
>
> For that you need to:
>
> 1. Provide the name of the factory class you are using, it could be
> the same factory class: TokenNameFinderFactory.class.getName()
> 2. Create an XML descriptor and pass it as a byte[] array
> 3. Load the resources (e.g., clusters) in a resources map consisting
> of the id of the resource and the serializer.
> 4. The sequenceCodec: BIO or BILOU.
>
> There Namefinder documentation was already updated:
>
>
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-docs/src/docbkx/namefinder.xml?view=markup
>
> There is sample code to do that in the CLI class:
>
>
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/namefind/TokenNameFinderTrainerTool.java?revision=1674262&view=markup
>
> and to run it from the CLI:
>
> 1. Create an XML feature descriptor, e.g., brown-feature.xml
>
> 
>   
> 
>   
> 
>   
>   
> 
>   
> 
>   
> 
>
> 2. Put your clustering lexicon(s) in a directory, .e.g, clusters
> 3. Train: bin/opennlp TokenNameFinderTrainer -featuregen brown.xml
> -resources clusters/ -params lang/ml/PerceptronTrainerParams.txt -lang
> en -model brown.bin -data
> ~/experiments/nerc/opennlp/en/conll03/en-testb.opennlp -encoding UTF-8
>
> If you open the brown.bin model you will see the cluster lexicon
> seralized inside the model.
>
> You can now use it like any other model, the TokenNameFinderFactory
> will read again all the required resources when loading the model in
> the TokenNameFinderME class.
>
> HTH,
>
> R
>
>
>
>
>
>
> On Mon, Feb 15, 2016 at 7:59 AM, Cohan Sujay Carlos 
> wrote:
> > Hi,
> >
> > I noticed that in the OpenNLP SVM 'trunk', the formerly deprecated
> > constructors for the class *NameFinderME*:
> >
> > *public NameFinderME(TokenNameFinderModel model, AdaptiveFeatureGenerator
> > generator, int beamSize, SequenceValidator sequenceValidator);*
> >
> > and
> >
> >
> > *public NameFinderME(TokenNameFinderModel model, AdaptiveFeatureGenerator
> > generator, int beamSize)*
> >
> > have been removed, along with
> >
> > *public NameFinderME(TokenNameFinderModel model, int beamSize)*
> >
> > The deprecation comments said:
> >
> > @deprecated the beam size is now configured during training time in the
> > trainer parameter file via beamSearch.beamSize
> >
> > and
> >
> > @deprecated Use {@link #NameFinderME(TokenNameFinderModel)} instead and
> use
> > the {@link TokenNameFinderFactory} to configure it.
> >
> > I wanted to point out a few potential problems:
> >

Re: Question about deprecated NameFinderME constructors

2016-03-07 Thread Rodrigo Agerri
Hi,

You can do all those tasks by using the create method in the
TokenNameFinderFactory:

http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/TokenNameFinderFactory.java?revision=1712553&view=markup#l100

For that you need to:

1. Provide the name of the factory class you are using, it could be
the same factory class: TokenNameFinderFactory.class.getName()
2. Create an XML descriptor and pass it as a byte[] array
3. Load the resources (e.g., clusters) in a resources map consisting
of the id of the resource and the serializer.
4. The sequenceCodec: BIO or BILOU.

There Namefinder documentation was already updated:

http://svn.apache.org/viewvc/opennlp/trunk/opennlp-docs/src/docbkx/namefinder.xml?view=markup

There is sample code to do that in the CLI class:

http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/namefind/TokenNameFinderTrainerTool.java?revision=1674262&view=markup

and to run it from the CLI:

1. Create an XML feature descriptor, e.g., brown-feature.xml


  

  

  
  

  

  


2. Put your clustering lexicon(s) in a directory, .e.g, clusters
3. Train: bin/opennlp TokenNameFinderTrainer -featuregen brown.xml
-resources clusters/ -params lang/ml/PerceptronTrainerParams.txt -lang
en -model brown.bin -data
~/experiments/nerc/opennlp/en/conll03/en-testb.opennlp -encoding UTF-8

If you open the brown.bin model you will see the cluster lexicon
seralized inside the model.

You can now use it like any other model, the TokenNameFinderFactory
will read again all the required resources when loading the model in
the TokenNameFinderME class.

HTH,

R






On Mon, Feb 15, 2016 at 7:59 AM, Cohan Sujay Carlos  wrote:
> Hi,
>
> I noticed that in the OpenNLP SVM 'trunk', the formerly deprecated
> constructors for the class *NameFinderME*:
>
> *public NameFinderME(TokenNameFinderModel model, AdaptiveFeatureGenerator
> generator, int beamSize, SequenceValidator sequenceValidator);*
>
> and
>
>
> *public NameFinderME(TokenNameFinderModel model, AdaptiveFeatureGenerator
> generator, int beamSize)*
>
> have been removed, along with
>
> *public NameFinderME(TokenNameFinderModel model, int beamSize)*
>
> The deprecation comments said:
>
> @deprecated the beam size is now configured during training time in the
> trainer parameter file via beamSearch.beamSize
>
> and
>
> @deprecated Use {@link #NameFinderME(TokenNameFinderModel)} instead and use
> the {@link TokenNameFinderFactory} to configure it.
>
> I wanted to point out a few potential problems:
>
> 1.  The corresponding train methods have not been removed.  So, it is
> possible to train a NameFinderME using a *custom* AdaptiveFeatureGenerator
> class to do feature engineering, but once a model has been so trained,
> there is no way to load and use the stored model with the same
> AdaptiveFeatureGenerator.
>
> 2.  There is still no documentation on the TokenNameFinderFactory which is
> supposed to replace the constructor with the AdaptiveFeatureGenerator.
>
> 3.  I went over the code of TokenNameFinderFactory and a few places where
> it is used and it seemed to be designed for working with an XML
> specification of feature combinations.  I have also in the references
> included a mailing list conversation that says this class should be used
> with an XML file.  However, it turns out that custom feature sets for
> sequential classification are often important, so might we be dropping
> valuable feature engineering support?
>
> Finally, in light of the above, could we keep the deprecated constructors
> around until the alternative constructor (using TokenNameFinderFactory)
> enters into production, and examples and documentation for it become widely
> available?
>
> References:
>
> On the TokenNameFinderFactory using XML:
> https://mail-archives.apache.org/mod_mbox/opennlp-dev/201410.mbox/%3CCAKvDkVDfAx5BMvwVOrbvpZm7xV9erRQzrzbCDpfd+Cq6m=x...@mail.gmail.com%3E
>
> Relevant JIRA issues:
> https://issues.apache.org/jira/browse/OPENNLP-718
> https://issues.apache.org/jira/browse/OPENNLP-717
>
> Thank you,
>
> Cohan Sujay Carlos


Re: OPENNLP-488

2016-03-07 Thread Rodrigo Agerri
Hi Jeffrey,

Thanks for contributing the patch and sorry for not answering before.
We will take a look and be back with any feeback.

Cheers,

Rodrigo

On Fri, Mar 4, 2016 at 3:56 PM, Jeffrey Zemerick  wrote:
> Hi,
>
> I attached a patch to OPENNLP-488 to remove the NPE when training with
> no events. Still an OpenNLP beginner so all feedback is welcome.
>
> Thanks,
> Jeff