Please excuse the duplicate email, we could not attach the mentioned figure. 
Kindly find it here.
Thank you.

From: anthonybeyler...@hotmail.com
To: dev@opennlp.apache.org
Subject: GSoC 2015 - WSD Module
Date: Mon, 18 May 2015 22:14:43 +0900




Dear all,
In the context of building a Word Sense Disambiguation (WSD) module, after 
doing a survey on WSD techniques, we realized the following points :
- WSD techniques can be split into three sets (supervised, 
unsupervised/knowledge based, hybrid) - WSD is used for different directly 
related objectives such as all-words disambiguation, lexical sample 
disambiguation, multi/cross-lingual approaches etc.- Senseval/Semeval seem to 
be good references to compare different techniques for WSD since many of them 
were tested on the same data (but different one each event).- For the sake of 
making a first solution, we propose to start with supporting the "lexical 
sample" type of disambiguation, meaning to disambiguate single/limited word(s) 
from an input text.
Therefore, we have decided to collect information about the different 
techniques in the literature (such as  references, performance, parameters 
etc.) in this spreadsheet here.Otherwise we have also collected the results of 
all the senseval/semeval exercises here.(Note that each document has many 
sheets)The collected results, could help decide on which techniques to start 
with as main models for each set of techniques (supervised/unsupervised).
We also propose a general approach for the package in the figure attached.The 
main components are as follows : 
1- The different resources publicly available : WordNet, BabelNet, Wikipedia, 
etc.However, we would also like to allow the users to use their own local 
resources, by maybe defining a type of connector to the resource interface.
2- The resource interface will have the role to provide both a sense inventory 
that the user can query and a knowledge base (such as semantic or syntactic 
info. etc.) that might be used depending on the technique.We might even later 
consider building a local cache for remote services. 
3- The WSD algorithms/techniques themselves that will make use of the resource 
interface to access the resources required.These techniques will be split into 
two main packages as in the left side of the figure :  
Supervised/Unsupervised.The utils package includes common tools used in both 
types of techniques.The details mentioned in each package should be common to 
all implementations of these abstract models.
4- I/O could be processed in different formats (XML/JSON etc) or a simpler 
structure following your recommendations.
If you have any suggestions or recommendations, we would really appreciate 
discussing them and would like your guidance to iterate on this tool-set.
Best regards,

Anthony Beylerian, Mondher Bouazizi                                             
                                  

Reply via email to