Re: GSoC 2015 - WSD Module

Joern Kottmann Thu, 09 Jul 2015 09:30:07 -0700

Please open a jira issues for this, and for other GSOC tasks.
I would like to use jira to plan the outstanding tasks.


Are you working on this currently?

Jörn

On Mon, 2015-06-22 at 00:55 +0900, Anthony Beylerian wrote:
> Dear Jörn,
> Thank you for that.
> 
> After further surveying, I was thinking of beginning the implementation of an 
> approach based on context clustering as a next step.
> Maybe similar to the one in [1] which relies on a public (CC-A licensed) 
> dataset [2].Since clustering is usually done using K-means, which could take 
> some time with large data, this was already done previously and the results 
> were made publicly available in [3] with up to 20 closest clusters per 
> "phrase".
> The authors in [1] propose to subsequently apply a Naive Bayes classifier as 
> described in their paper.I believe this is straight-forward enough to 
> implement as another unsupervised approach for the proposed time-frame.
> Would like your opinion.
> Regards,
> Anthony
> [1] http://nlp.cs.rpi.edu/paper/wsd.pdf[2] 
> http://storage.googleapis.com/books/ngrams/books/datasetsv2.html[3] 
> http://webdocs.cs.ualberta.ca/~bergsma/PhrasalClusters/
> 
> 
> > Date: Fri, 19 Jun 2015 16:41:20 +0200
> > Subject: Re: GSoC 2015 - WSD Module
> > From: kottm...@gmail.com
> > To: dev@opennlp.apache.org
> > 
> > Hello,
> > 
> > I will dedicate time tonight to get this pulled in the sandbox and will
> > then also provide some feedback.
> > We can then create new patches against the sandbox to fix further issues.
> > 
> > Jörn
> > 
> > On Fri, Jun 19, 2015 at 11:02 AM, Anthony Beylerian <
> > anthonybeyler...@hotmail.com> wrote:
> > 
> > > Thank you for the reply, I am guessing for now we will use the other
> > > sources.
> > >
> > > By the way, I  have uploaded a newer patch on the same issue [1].
> > > Would like to know if the approach to set parameters is acceptable.
> > >
> > > Also, we are referencing to some model files locally like tokenizer,
> > > tagger, etc because we need them for the preprocessing chain.for example :
> > >
> > > ++++++++++++++++++++++
> > > private static String modelsDir =
> > > "src\\test\\resources\\opennlp\\tools\\disambiguator\\";
> > >
> > > TokenizerModel  tokenizerModel = new TokenizerModel(new
> > > FileInputStream(modelsDir + "en-token.bin"));tokenizer = new
> > > TokenizerME(tokenizerModel);
> > > ++++++++++++++++++++++
> > >
> > > Thought of adding these files (.bin) in the test folder, but could anyone
> > > recommend a more elegant way  to do this ?
> > > Thanks !
> > >
> > > Anthony
> > >
> > > [1] : https://issues.apache.org/jira/browse/OPENNLP-758
> > >
> > >
> > > > From: rage...@apache.org
> > > > Date: Fri, 19 Jun 2015 10:18:12 +0200
> > > > Subject: Re: GSoC 2015 - WSD Module
> > > > To: dev@opennlp.apache.org
> > > >
> > > > Thanks for the update and the updated patch.
> > > >
> > > > With respect to the licensing of BabelNet, I do not think we can
> > > > redistribute CC BY-NC-SA resources here, but others in this project
> > > > and Apache in general will probably know better than me.
> > > >
> > > > Best,
> > > >
> > > > Rodrigo
>

signature.asc
Description: This is a digitally signed message part

Re: GSoC 2015 - WSD Module

Reply via email to