Dear all, Sorry if you received multiple copies of this email (The links were embedded). Here are the actual links:
*Figure:* https://drive.google.com/file/d/0B7ON7bq1zRm3Sm1YYktJTVctLWs/view?usp=sharing *Semeval/senseval results summary:* https://docs.google.com/spreadsheets/d/1NCiwXBQs0rxUwtZ3tiwx9FZ4WELIfNCkMKp8rlnKObY/edit?usp=sharing *Literature survey of WSD techniques:* https://docs.google.com/spreadsheets/d/1WQbJNeaKjoT48iS_7oR8ifZlrd4CfhU1Tay_LLPtlCM/edit?usp=sharing Yours faithfully On Mon, May 18, 2015 at 10:17 PM, Anthony Beylerian < anthonybeyler...@hotmail.com> wrote: > Please excuse the duplicate email, we could not attach the mentioned > figure. > Kindly find it here. > Thank you. > > From: anthonybeyler...@hotmail.com > To: dev@opennlp.apache.org > Subject: GSoC 2015 - WSD Module > Date: Mon, 18 May 2015 22:14:43 +0900 > > > > > Dear all, > In the context of building a Word Sense Disambiguation (WSD) module, after > doing a survey on WSD techniques, we realized the following points : > - WSD techniques can be split into three sets (supervised, > unsupervised/knowledge based, hybrid) - WSD is used for different directly > related objectives such as all-words disambiguation, lexical sample > disambiguation, multi/cross-lingual approaches etc.- Senseval/Semeval seem > to be good references to compare different techniques for WSD since many of > them were tested on the same data (but different one each event).- For the > sake of making a first solution, we propose to start with supporting the > "lexical sample" type of disambiguation, meaning to disambiguate > single/limited word(s) from an input text. > Therefore, we have decided to collect information about the different > techniques in the literature (such as references, performance, parameters > etc.) in this spreadsheet here.Otherwise we have also collected the results > of all the senseval/semeval exercises here.(Note that each document has > many sheets)The collected results, could help decide on which techniques to > start with as main models for each set of techniques > (supervised/unsupervised). > We also propose a general approach for the package in the figure > attached.The main components are as follows : > 1- The different resources publicly available : WordNet, BabelNet, > Wikipedia, etc.However, we would also like to allow the users to use their > own local resources, by maybe defining a type of connector to the resource > interface. > 2- The resource interface will have the role to provide both a sense > inventory that the user can query and a knowledge base (such as semantic or > syntactic info. etc.) that might be used depending on the technique.We > might even later consider building a local cache for remote services. > 3- The WSD algorithms/techniques themselves that will make use of the > resource interface to access the resources required.These techniques will > be split into two main packages as in the left side of the figure : > Supervised/Unsupervised.The utils package includes common tools used in > both types of techniques.The details mentioned in each package should be > common to all implementations of these abstract models. > 4- I/O could be processed in different formats (XML/JSON etc) or a simpler > structure following your recommendations. > If you have any suggestions or recommendations, we would really appreciate > discussing them and would like your guidance to iterate on this tool-set. > Best regards, > > Anthony Beylerian, Mondher Bouazizi >