Re: [VOTE] Apache OpenNLP 1.7.1 Release Candidate 1

2017-01-24 Thread Jason Baldridge
Hey all,

Just wanted to say that even though I'm not able to contribute at this
time, it's great to see all the activity! Gann Bierner also saw the recent
release and was super happy to see it.

BTW, for a tutorial I did last year, I tried out SpaCy and was really
impressed with the thought that went into it, and how much could be done
really easily out of the box. If you haven't look at it, check it out, and
maybe think about design ideas that could be pulled into OpenNLP:

https://spacy.io/

To get a flavor of some things you can do out of the box with SpaCy, have a
look at this Jupyter notebook I put together:

https://github.com/utcompling/sastut/blob/master/sas2016_spacy_demo.ipynb

Keep up the good work!

-Jason

On Mon, 23 Jan 2017 at 23:11 Mattmann, Chris A (3010) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Thank you and great work!
>
> ++
> Chris Mattmann, Ph.D.
> Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, Open Source Projects Formulation and Development Office (8212)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 180-503E, Mailstop: 180-503
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
>
>
> On 1/23/17, 3:01 PM, "Suneel Marthi"  wrote:
>
> Thanks all for voting, its past 72 hrs and below are the vote results:
>
> 5 +1 binding - Joern, Rodrigo, William, Tommaso, Suneel
> 2 +1 non-binding - Richard, Jeffrey
>
> This VOTE is now closed and the OpenNLP 1.7.1 release passes.
>
> On Mon, Jan 23, 2017 at 10:02 AM, Rodrigo Agerri 
> wrote:
>
> > +1 to release
> >
> > nice
> >
> > R
> >
> > On Mon, Jan 23, 2017 at 9:33 AM, Joern Kottmann 
> > wrote:
> >
> > > +1 binding
> > >
> > > Jörn
> > >
> > > On Jan 21, 2017 12:18 AM, "Suneel Marthi" 
> wrote:
> > >
> > > The Apache OpenNLP PMC would like to call for a Vote on Apache
> OpenNLP
> > > 1.7.1 Release Candidate.
> > >
> > > The Release artifacts can be downloaded from:
> > > https://repository.apache.org/content/repositories/
> > > orgapacheopennlp-1008/org/apache/opennlp/opennlp-distr/1.7.1/
> > >
> > > The release was made from the Apache OpenNLP 1.7.1 tag at
> > > https://github.com/apache/opennlp/tree/opennlp-1.7.1
> > >
> > > To use it in a maven build set the version for opennlp-tools or
> > > opennlp-uima to 1.7.1
> > > and add the following URL to your settings.xml file:
> > >
> https://repository.apache.org/content/repositories/orgapacheopennlp-1008
> > >
> > > The artifacts have been signed with the Key - D3541808 found at
> > > http://people.apache.org/keys/group/opennlp.asc
> > >
> > > Please vote on releasing these packages as Apache OpenNLP 1.7.1.
> The vote
> > > is
> > > open for either the next 72 hours or a minimum of 3 +1 PMC binding
> votes.
> > >
> > > Only votes from OpenNLP PMC are binding, but folks are welcome to
> check
> > the
> > > release candidate and voice their approval or disapproval. The vote
> > passes
> > > if at least three binding +1 votes are cast.
> > >
> > > [ ] +1 Release the packages as Apache OpenNLP 1.7.1
> > > [ ] -1 Do not release the packages because...
> > > [ ]  0 I Care Less/I Don't Care
> > >
> > > Thanks again to all the committers and contributors for their work
> over
> > the
> > > past few weeks.
> > >
> > > Suneel Marthi
> > >
> >
>
>
>


Re: Sentiment Analysis Parser updates

2016-06-22 Thread Jason Baldridge
Anastasija,

There might be a few appropriate sentiment datasets listed in my homework
on Twitter sentiment analysis:

https://github.com/utcompling/applied-nlp/wiki/Homework5

There may also be some useful data sets in the Crowdflower Open Data
collection:

https://www.crowdflower.com/data-for-everyone/

Hope this helps!

-Jason

On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova <
mensikova.anastas...@gmail.com> wrote:

> Hi everyone,
>
> Some updates on our Sentiment Analysis Parser work.
>
> You might have noticed, I have enhanced our website (the GH page) recently,
> polished it and made it more user-friendly. My next step will be sending a
> pull request to Tika. However, my main goal until the end of Google Summer
> of Code is to enhance the parser in a way that will allow it to work
> categorically (in other words, the sentiment determined won't be just
> positive or negative, it will have a few categories). This means that my
> next step is to look for a categorical open data set (which I will
> hopefully do by the end of the weekend the latest) and, of course, enhance
> my model and training. After that I will look into how the confidence
> levels can be increased.
>
> Have a great day/night!
>
> Thank you,
> Anastasija Mensikova.
>


Re: Performances of OpenNLP tools

2016-06-21 Thread Jason Baldridge
Jörn is absolutely right about that. Another good source of training data
is MASC. I've got some instructions for training models with MASC here:

https://github.com/scalanlp/chalk/wiki/Chalk-command-line-tutorial

Chalk (now defunct) provided a Scala wrapper around OpenNLP functionality,
so the instructions there should make it fairly straightforward to adapt
MASC data to OpenNLP.

-Jason

On Tue, 21 Jun 2016 at 10:46 Joern Kottmann <kottm...@gmail.com> wrote:

> There are some research papers which study and compare the performance of
> NLP toolkits, but be careful often they don't train the NLP tools on the
> same data and the training data makes a big difference on the performance.
>
> Jörn
>
> On Tue, Jun 21, 2016 at 5:44 PM, Joern Kottmann <kottm...@gmail.com>
> wrote:
>
> > Just don't use the very old existing models, to get good results you have
> > to train on your own data, especially if the domain of the data used for
> > training and the data which should be processed doesn't match. The old
> > models are trained on 90s news, those don't work well on todays news and
> > probably much worse on tweets.
> >
> > OntoNots is a good place to start if the goal is to process news. OpenNLP
> > comes with build-in support to train models from OntoNotes.
> >
> > Jörn
> >
> > On Tue, Jun 21, 2016 at 4:20 PM, Mattmann, Chris A (3980) <
> > chris.a.mattm...@jpl.nasa.gov> wrote:
> >
> >> This sounds like a fantastic idea.
> >>
> >> ++
> >> Chris Mattmann, Ph.D.
> >> Chief Architect
> >> Instrument Software and Science Data Systems Section (398)
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 168-519, Mailstop: 168-527
> >> Email: chris.a.mattm...@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++
> >> Director, Information Retrieval and Data Science Group (IRDS)
> >> Adjunct Associate Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> WWW: http://irds.usc.edu/
> >> ++
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 6/21/16, 12:13 AM, "Anthony Beylerian" <anthonybeyler...@hotmail.com
> >
> >> wrote:
> >>
> >> >+1
> >> >
> >> >Maybe we could put the results of the evaluator tests for each
> component
> >> somewhere on a webpage and on every release update them.
> >> >This is of course provided there are reasonable data sets for testing
> >> each component.
> >> >What do you think?
> >> >
> >> >Anthony
> >> >
> >> >> From: mondher.bouaz...@gmail.com
> >> >> Date: Tue, 21 Jun 2016 15:59:47 +0900
> >> >> Subject: Re: Performances of OpenNLP tools
> >> >> To: dev@opennlp.apache.org
> >> >>
> >> >> Hi,
> >> >>
> >> >> Thank you for your replies.
> >> >>
> >> >> Please Jeffrey accept once more my apologies for receiving the email
> >> twice.
> >> >>
> >> >> I also think it would be great to have such studies on the
> >> performances of
> >> >> OpenNLP.
> >> >>
> >> >> I have been looking for this information and checked in many places,
> >> >> including obviously google scholar, and I haven't found any serious
> >> studies
> >> >> or reliable results. Most of the existing ones report the
> performances
> >> of
> >> >> outdated releases of OpenNLP, and focus more on the execution time or
> >> >> CPU/RAM consumption, etc.
> >> >>
> >> >> I think such a comparison will help not only evaluate the overall
> >> accuracy,
> >> >> but also highlight the issues with the existing models (as a matter
> of
> >> >> fact, the existing models fail to recognize many of the hashtags in
> >> tweets:
> >> >> the tokenizer splits them into the "#" symbol and a word that the PoS
> >> >> tagger also fails to recognize).
> >> >>
> >> >> Therefore, building Twitter-based models would also be useful, since
> >> many
> >> >> of the works in acade

Re: Performances of OpenNLP tools

2016-06-20 Thread Jason Baldridge
It would be fantastic to have these numbers. This is an example of
something that would be a great contribution by someone trying to
contribute to open source and who is maybe just getting into machine
learning and natural language processing.

For Twitter-ish text, it'd be great to look at models trained and evaluated
on the Tweet NLP resources:

http://www.cs.cmu.edu/~ark/TweetNLP/

And comparing to how their models performed, etc. Also, it's worth looking
at spaCy (Python NLP modules) for further comparisons.

https://spacy.io/

-Jason

On Mon, 20 Jun 2016 at 10:41 Jeffrey Zemerick  wrote:

> I saw the same question on the users list on June 17. At least I thought it
> was the same question -- sorry if it wasn't.
>
> On Mon, Jun 20, 2016 at 11:37 AM, Mattmann, Chris A (3980) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
> > Well, hold on. He sent that mail (as of the time of this mail) 4
> > mins previously. Maybe some folks need some time to reply ^_^
> >
> > ++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 6/20/16, 8:23 AM, "Jeffrey Zemerick"  wrote:
> >
> > >Hi Mondher,
> > >
> > >Since you didn't get any replies I'm guessing no one is aware of any
> > >resources related to what you need. Google Scholar is a good place to
> look
> > >for papers referencing OpenNLP and its methods (in case you haven't
> > >searched it already).
> > >
> > >Jeff
> > >
> > >On Mon, Jun 20, 2016 at 11:19 AM, Mondher Bouazizi <
> > >mondher.bouaz...@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> Apologies if you received multiple copies of this email. I sent it to
> > the
> > >> users list a while ago, and haven't had an answer yet.
> > >>
> > >> I have been looking for a while if there is any relevant work that
> > >> performed tests on the OpenNLP tools (in particular the Lemmatizer,
> > >> Tokenizer and PoS-Tagger) when used with short and noisy texts such as
> > >> Twitter data, etc., and/or compared it to other libraries.
> > >>
> > >> By performances, I mean accuracy/precision, rather than time of
> > execution,
> > >> etc.
> > >>
> > >> If anyone can refer me to a paper or a work done in this context, that
> > >> would be of great help.
> > >>
> > >> Thank you very much.
> > >>
> > >> Mondher
> > >>
> >
>


Re: Question about OpenNLP and comparison to e.g., NTLK, Stanford NER, etc.

2015-11-12 Thread Jason Baldridge
As one of the people who got OpenNLP started in the late 1990's (for
research, but hoping it could be used by industry), it makes me smile to
know that lots of people use it happily to this day. :)

There are lots of new kids in town, but the licensing is often conflicted,
and the biggest benefits often come---as Joern mentions---by having the
right data to train your classifier.

Having said that, there is a lot of activity in the deep learning space,
where old techniques (neural nets) are now viable in ways they weren't
previously, and they are outperforming linear classifiers in task after
task. I'm currently looking at Deeplearning4J, and it would be great to
have OpenNLP or a project like it make solid NLP models available based on
deep learning methods, especially LSTMs and Convolutional Neural Nets.
Deeplearning4J is Java/Scala friendly and it is ASL, so that's at least
setting off on the right foot.

http://deeplearning4j.org/

The ND4J library (based on Numpy) that was built to support DL4J is also
likely to be useful for other Java projects that use machine learning.

-Jason

On Thu, 12 Nov 2015 at 09:44 Russ, Daniel (NIH/CIT) [E] 
wrote:

> Chris,
> Joern is correct.  However, If I can slightly disagree on a few minor
> points.
>
> 1) I use the old sourceforge models.  I find that the source of error in
> my analysis are usually not do to mistakes in sentence detection or POS
> tagging.  I don’t have the annotated data or the time/money to build custom
> models.  Yes, the text I analyze is quite different than the (WSJ? or what
> corpus was used to build the models), but it is good enough.
>
> 2)  MaxEnt is still good classifier for NLP, and L-BFGS is just a
> algorithm to calculate the weights for the features.  It is an improvement
> on GIS, not a different classifier.  I am not familiar enough with CRF’s to
> comment, but the seminal paper by Della Peitra (IEEE trans part anal and
> Mach intel, v19 no4 1997) make it appear as an extension of MaxEnt.  The
> Stanford NLP groups have ppt lectures online explain why discriminative
> classification methods (e.g. MaxEnt) work better then generative (Naive
> Bayes) models (see
> https://web.stanford.edu/class/cs124/lec/Maximum_Entropy_Classifiers.pdf;
> particularly the example with traffic lights)
>
> As I briefly mentioned earlier.  OpenNLP is a mature product.  It has
> undergone some MAJOR upgrades.  It is not obsolete.  As for the other
> tools/libraries, they are also fine products.  I use the Stanford parser to
> get dependency information.  OpenNLP just does not do it.  I don’t use NTLK
> because I need to.  If the need arises, I will.  I assume that you don’t
> have the time and money to learn every new NLP product.  I would say play
> to your strengths. If you know the package use it. Don’t change because
> it’s trendy.
>
>
>
> Daniel Russ, Ph.D.
> Staff Scientist, Division of Computational Bioscience
> Center for Information Technology
> National Institutes of Health
> U.S. Department of Health and Human Services
> 12 South Drive
> Bethesda,  MD 20892-5624
>
> On Nov 11, 2015, at 4:41 PM, Joern Kottmann > wrote:
>
> Hello,
>
>
>
> It is definitely true that OpenNLP exists for a long time (more than 10
> years), but that doesn't mean it wasn't improved. Actually it changed a
> lot in that period.
>
> The core strength of OpenNLP was always that it can be used really easy
> to perform one of the supported NLP tasks.
>
> This was further improved with the 1.5 release adding model packages
> that ensure that the components are always instantiated correctly across
> different runtime environments.
>
> The problem is that the system used to perform the training of a model
> and the system used to run it can be quite different. Prior 1.5 it was
> possible to get that wrong, which resulted in hard to notice performance
> problems.
> I suspect that is an issue many of the competing solutions still have
> today.
>
> An example is the usage of String.toLowerCase(). The output out it
> depends on the platform local.
>
> One of the things that got dated a bit was the machine learning part of
> OpneNLP, this was addressed by adding more algorithms (e.g. perceptron
> and L-BFGS maxent). In addition the machine learning part is now
> plugable and can easily be switched against a different implementation
> for testing or production use. The sandbox contains an experimental
> mallet integration which offers all the mallet classifiers even CRF can
> be used.
>
> On Fri, 2015-11-06 at 16:54 +, Mattmann, Chris A (3980) wrote:
> Hi Everyone,
>
> Hope you¹re well! I¹m new to the list, however I just wanted to
> state I¹m really happy with what I¹ve seen with OpenNLP so far.
> My team and I have built a Tika Parser [1] that uses OpenNLP¹s
> location NER model, along with a Lucene Geo Names Gazeeteer [2]
> to create a ³GeoTopicParser². We are improving it day to day.
> OpenNLP has definitely come 

Re: [VOTE] Release OpenNLP 1.5.3 RC 3

2013-04-09 Thread Jason Baldridge
+1


On Tue, Apr 9, 2013 at 12:38 PM, Jörn Kottmann kottm...@gmail.com wrote:

 +1, all tests are good, lets release it.

 Jörn


 On 04/09/2013 02:51 PM, William Colen wrote:

 Hello,

 Lets vote to release RC 3 as OpenNLP 1.5.3.

 The testing of it is documented here:
 https://cwiki.apache.org/**confluence/display/OPENNLP/**TestPlan1.5.3https://cwiki.apache.org/confluence/display/OPENNLP/TestPlan1.5.3

 The RC can be downloaded here:
 http://people.apache.org/~**colen/releases/opennlp-1.5.3/**rc3http://people.apache.org/~colen/releases/opennlp-1.5.3/rc3

 Please vote to approve this release:
 [ ] +1 Approve the release
 [ ] -1 Veto the release (please provide specific comments)
 [ ] 0   Don't care

 Please report any problems you may find.





-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: Liblinear (was: OpenNLP 1.5.3 RC 2 ready for testing)

2013-03-22 Thread Jason Baldridge
I used the Java port. I actually pulled it into nak as nak.liblinear
because the model write/read code did it as text files and I needed access
to the Model member fields in order to do the serialization how I wanted.
Otherwise it remains as is. With a little bit of adaptation, you could
provide a Java wrapper in OpenNLP that follows the same pattern as my Scala
stuff. You'd just need to make it implement AbstractModel, which shouldn't
be too hard. (I have it implement LinearModel, which is just a slight
modification of MaxentModel, and I changed all uses of AbstractModel to
LinearModel in Chalk [the opennlp.tools portion]). -j

On Fri, Mar 22, 2013 at 9:32 AM, Jörn Kottmann kottm...@gmail.com wrote:

 Sounds interesting, I hope we will find the time to do that in OpenNLP
 after the 1.5.3 release too. We already discussed this and I think had
 consensus
 on making the machine learning pluggable and then offer a few addons for
 existing libraries.

 Good to know that liblinear works well, as far as I know its written in
 C/C++,
 did you use the Java port of it, or did you wrote a JNI interface?

 Jörn

 On 03/22/2013 03:08 PM, Jason Baldridge wrote:

 BTW, I've just recently finished integrating Liblinear into Nak (which is
 an adaptation of the maxent portion of OpenNLP). I'm still rounding some
 things out, but so far it is producing more accurate models that are
 trained in less time and without using cutoffs. Here's the code:
 https://github.com/scalanlp/**nak https://github.com/scalanlp/nak

 It is still mostly Java, but the liblinear adaptors are in Scala. I've
 kept
 things such that liblinear retrofits to the interfaces that were in
 opennlp.maxent, though given how well it is working, I'll be stripping
 those out and going with liblinear for everything in upcoming versions.

 Happy to answer any questions or help out with any of the above if it
 might
 be useful!





-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: sourceforge.net site

2013-03-01 Thread Jason Baldridge
Looks like Jorn must have taken care of it already! -Jason

On Thu, Feb 28, 2013 at 7:58 PM, James Kosin james_ko...@cox.net wrote:

 Jorn or Jason,

 Can one of you update the SourceForge site to point to our
 http://opennlp.apache.org site now that we are off the incubator site.
  We seemed to have overlooked this there.
 I don't seem to have access to do this

 Thanks,
 James




-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: Next release

2013-02-14 Thread Jason Baldridge
+1

On Thu, Feb 14, 2013 at 6:31 AM, William Colen william.co...@gmail.comwrote:

 Hi!!

 I can be the Release Manager for 1.5.3. It would be nice because Jörn was
 the Release Manager for all the other releases and we should have other
 members of the team familiar with the process.

 I would like to candidate myself as the Release Manager for 1.5.3. I will
 start building our first RC with Jörn supervision soon.

 Thank you,
 William

 On Wed, Dec 19, 2012 at 6:17 PM, Jörn Kottmann kottm...@gmail.com wrote:

  Lets start to get the release done, are there any issues expect the two
  open
  ones which need to go into this release ?
 
  Open issues are:
  OPENNLP-541 Improve ADChunkSampleStream
  OPENNLP-402 CLI tools and formats refactored
 
  Jörn
 
 
  On 09/12/2012 03:56 PM, Jörn Kottmann wrote:
 
  Hi all,
 
  it has been a while since we released 1.5.2 and to me it looks
  like its time for 1.5.3. I usually work now with the trunk version
  because it just contain too many fixes I need for my day job.
 
  I will volunteer to be release manager if nobody else wants to
  take this role.
 
  Any opinions?
 
  Jörn
 
 
 




-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: Migrate to Git?

2012-12-20 Thread Jason Baldridge
+1, definitely.

On Wed, Dec 19, 2012 at 7:59 PM, William Colen william.co...@gmail.comwrote:

 +1 to move after 1.5.3

 William Colen



 On Wed, Dec 19, 2012 at 10:09 PM, James Kosin james.ko...@gmail.com
 wrote:

  I've used both
 
  Only thing is I find SVN a little easier on the beginner.
  Git has many options that aren't so obvious as to the purpose... until
  you get your hands dirty.  I've had several git projects get in a state
  of not updating as a result.
 
  James Kosin
 
  On 12/19/2012 4:05 PM, Aliaksandr Autayeu wrote:
   I'm in favor, I use it anyway, it's much faster. I'd also wait till
 1.5.3
   release.
  
   Aliaksandr
  
   On Wed, Dec 19, 2012 at 9:09 PM, Jörn Kottmann kottm...@gmail.com
  wrote:
  
   Hi all,
  
   I heard at ApacheCon Europe that it should be possible to migrate from
   Subverion to Git.
  
   Is there any interest in doing that? If we decide to do it I suggest
 to
   wait until the
   1.5.3 release is done so we have a bit time to also migrate our build
   process.
  
   Do have all committers experience with git?
  
   Jörn
  
 
 




-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: Host stock models in maven central

2012-08-08 Thread Jason Baldridge
Sorry if I missed something along the way -- who did the annotation of the
Wikipedia data?

BTW, the OANC will soon come out with their 3.0 release of MASC (the
Manually Annotated Sub-Corpus), with about 800k tokens of English text
(multiple domains, including twitter, blogs, transcribed spoken, and more)
labeled with several different levels of analysis, including chunks (noun
and verb), entities, tokens, POS tags, sentence boundaries, and logical
forms.

http://www.americannationalcorpus.org/MASC/Home.html

On Wed, Aug 8, 2012 at 2:47 AM, Jörn Kottmann kottm...@gmail.com wrote:

 On 08/08/2012 06:16 AM, Michael Schmitz wrote:

 Hi, here are some models trained on Wikipedia data.  They have similar
 performance.  Is this useful?


 Yes, people who do not have access to our MUC based training
 data can just use the wiki data instead and combine it with their data.

 Thanks for sharing.

 Now all we need is a way to get label corrections from the community :-)

 Jörn




-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: Syntatic roles with OPENNLP

2012-06-15 Thread Jason Baldridge
Try using MSTParser:

http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html


On Thu, Jun 14, 2012 at 3:48 PM, Jörn Kottmann kottm...@gmail.com wrote:

 On 06/14/2012 10:28 PM, Carlos Scheidecker wrote:

 What if you need to parse/divide a clause/phrase into syntatic roles?

 For instance, Subject, object, preposition, direct object, indirect
 object.

 Is there any library or system that would do that with OpenNLP?

 Has anyone performed Syntatic role classification/extraction using OpenNLP
 before?


 No, that is not possible with OpenNLP, but we are open for any
 contributions.

 Jörn




-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: OpenNLP 1.5.3 ....

2012-05-03 Thread Jason Baldridge
+1 to going forward with a 1.5.3 release that might not have all of the
aforementioned items in it. I'd say release often, release early through
the third digit in the version numbering and let 1.6, 1.7, etc be larger
milestones. Regardless of this, the roadmap would be great.

On Thu, May 3, 2012 at 9:15 PM, James Kosin james.ko...@gmail.com wrote:

 On 5/3/2012 7:29 AM, Jörn Kottmann wrote:
  On 05/03/2012 01:20 PM, william.co...@gmail.com wrote:
   From my side I need to add things to the manual, like about the
  evaluation
  reports and customization factories. But documentation can be finished
  while we try our release candidates.
 
  Besides that there is an issue Jörn mentioned before, we are not
  supporting
  OSGi as we should, at least not the Customization Factories.
 
  My vote is +1, but I would ask a couple of weeks for me to implement the
  MutableDictionary and check the Customization Factories.
 
 
 
  I would really like to have a way to make the machine learning plug-able
  for this release, it shouldn't be that hard. But I haven't finished my
  proposal yet.
  This way would make it easy for other to experiment with different
  classifiers,
  e.g. the ones in MALLET.
 
  Jörn
 +1,  Can we all get a consensus on the time-frame and a list of items we
 would like to get added for this release then?  I'm only saying it so
 that we at least have a plan others can track while waiting.  The only
 other options would be going back to the 1.5.2 release and maybe coming
 up with either a patch or a build for the critical issues and bugs that
 seem to produce undesirable results.  Maybe calling it 1.5.2.1 or
 something





-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: Proposal to incorporate clojure-opennlp into OpenNLP sandbox

2012-04-04 Thread Jason Baldridge
+1

On Wed, Apr 4, 2012 at 2:41 PM, Jörn Kottmann kottm...@gmail.com wrote:

 I think it would be a great addition to the project and should be developed
 here at Apache and not on github, doing at github splits up the community a
 bit and people never really know whats the right place to ask question if
 they
 run into a problem.

 Jörn


 On 04/04/2012 06:49 PM, Lee Hinman wrote:

 All,
 Jörn contacted me a while back about incorporating clojure-opennlp[1]
 into OpenNLP (I believe under the sandbox). I would like to propose the
 contribution be added in. I'm not sure what the exact actions are in order
 to get approval or disapproval (voting?).


 I also wanted to voice some concerns about whether I would be able to
 have commit access the the repository if the project is incorporated, just
 for people to be aware of.

 I'd like any kind of feedback about the contribution in general, please
 let me know if there are any other steps I need to take for this.

 - Lee Hinman

 [1]: 
 https://github.com/dakrone/**clojure-opennlphttps://github.com/dakrone/clojure-opennlp






-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge