Re: GSoC 2015 - WSD Module

2015-06-25 Thread Joern Kottmann
On Mon, 2015-06-22 at 00:55 +0900, Anthony Beylerian wrote:
 Dear Jörn,
 Thank you for that.
 
 After further surveying, I was thinking of beginning the implementation of an 
 approach based on context clustering as a next step.
 Maybe similar to the one in [1] which relies on a public (CC-A licensed) 
 dataset [2].Since clustering is usually done using K-means, which could take 
 some time with large data, this was already done previously and the results 
 were made publicly available in [3] with up to 20 closest clusters per 
 phrase.
 The authors in [1] propose to subsequently apply a Naive Bayes classifier as 
 described in their paper.I believe this is straight-forward enough to 
 implement as another unsupervised approach for the proposed time-frame.
 Would like your opinion.

Sounds good to me. I will read the paper now, and come back here if I
have any questions.

Jörn


signature.asc
Description: This is a digitally signed message part


Re: GSoC 2015 - WSD Module

2015-06-25 Thread Joern Kottmann
On Wed, 2015-06-10 at 22:13 +0900, Anthony Beylerian wrote:
 Hi,
 
 I attached an initial patch to OPENNLP-758.
 However, we are currently modifying things a bit since many approaches need 
 to be supported, but would like your recommendations.
 Here are some notes : 
 
 1 - We used extJWNL
 2- [WSDisambiguator] is the main interface
 3- [Loader] loads the resources required
 4- Please check [FeaturesExtractor] for the mentioned methods by Rodrigo.
 5- [Lesk] has many variants, we already implemented some, but wondering on 
 the preferred way to switch from one to the other:
 As of now we use one of them as default, but we thought of either making a 
 parameter list to fill or make separate classes for each, or otherwise 
 following your preference.
 6- The other classes are for convenience.
 
 We will try to patch frequently on the separate issues, following the 
 feedback.


Sounds good, I reviewed it and think what we have is quite ok.

Most important now is to fix the smaller issues (see the jira issue) and
explain to us how it can be run.

The midterm evaluation is coming up next week as well.

How are we standing with the milstone we set?

Jörn



signature.asc
Description: This is a digitally signed message part


Re: GSoC 2015 - WSD Module

2015-06-25 Thread Joern Kottmann
On Mon, 2015-06-22 at 00:55 +0900, Anthony Beylerian wrote:
 Dear Jörn,
 Thank you for that.
 
 After further surveying, I was thinking of beginning the implementation of an 
 approach based on context clustering as a next step.
 Maybe similar to the one in [1] which relies on a public (CC-A licensed) 
 dataset [2].Since clustering is usually done using K-means, which could take 
 some time with large data, this was already done previously and the results 
 were made publicly available in [3] with up to 20 closest clusters per 
 phrase.
 The authors in [1] propose to subsequently apply a Naive Bayes classifier as 
 described in their paper.I believe this is straight-forward enough to 
 implement as another unsupervised approach for the proposed time-frame.
 Would like your opinion.

Your users can just download the dataset and do the clustering them
self. It should be possible to do that anyway. All the code necessary to
do that should be available as part of your contribution.

Jörn


signature.asc
Description: This is a digitally signed message part