Dear Jörn,
Thank you for that.

After further surveying, I was thinking of beginning the implementation of an 
approach based on context clustering as a next step.
Maybe similar to the one in [1] which relies on a public (CC-A licensed) 
dataset [2].Since clustering is usually done using K-means, which could take 
some time with large data, this was already done previously and the results 
were made publicly available in [3] with up to 20 closest clusters per "phrase".
The authors in [1] propose to subsequently apply a Naive Bayes classifier as 
described in their paper.I believe this is straight-forward enough to implement 
as another unsupervised approach for the proposed time-frame.
Would like your opinion.
Regards,
Anthony
[1] http://nlp.cs.rpi.edu/paper/wsd.pdf[2] 
http://storage.googleapis.com/books/ngrams/books/datasetsv2.html[3] 
http://webdocs.cs.ualberta.ca/~bergsma/PhrasalClusters/


> Date: Fri, 19 Jun 2015 16:41:20 +0200
> Subject: Re: GSoC 2015 - WSD Module
> From: [email protected]
> To: [email protected]
> 
> Hello,
> 
> I will dedicate time tonight to get this pulled in the sandbox and will
> then also provide some feedback.
> We can then create new patches against the sandbox to fix further issues.
> 
> Jörn
> 
> On Fri, Jun 19, 2015 at 11:02 AM, Anthony Beylerian <
> [email protected]> wrote:
> 
> > Thank you for the reply, I am guessing for now we will use the other
> > sources.
> >
> > By the way, I  have uploaded a newer patch on the same issue [1].
> > Would like to know if the approach to set parameters is acceptable.
> >
> > Also, we are referencing to some model files locally like tokenizer,
> > tagger, etc because we need them for the preprocessing chain.for example :
> >
> > ++++++++++++++++++++++
> > private static String modelsDir =
> > "src\\test\\resources\\opennlp\\tools\\disambiguator\\";
> >
> > TokenizerModel  tokenizerModel = new TokenizerModel(new
> > FileInputStream(modelsDir + "en-token.bin"));tokenizer = new
> > TokenizerME(tokenizerModel);
> > ++++++++++++++++++++++
> >
> > Thought of adding these files (.bin) in the test folder, but could anyone
> > recommend a more elegant way  to do this ?
> > Thanks !
> >
> > Anthony
> >
> > [1] : https://issues.apache.org/jira/browse/OPENNLP-758
> >
> >
> > > From: [email protected]
> > > Date: Fri, 19 Jun 2015 10:18:12 +0200
> > > Subject: Re: GSoC 2015 - WSD Module
> > > To: [email protected]
> > >
> > > Thanks for the update and the updated patch.
> > >
> > > With respect to the licensing of BabelNet, I do not think we can
> > > redistribute CC BY-NC-SA resources here, but others in this project
> > > and Apache in general will probably know better than me.
> > >
> > > Best,
> > >
> > > Rodrigo
                                          

Reply via email to