Hello, one of the tasks we should start is, is to define the interface for the WSD component.
Please have a look at the other components in OpenNLP and try to propose an interface in a similar style. Can we use one interface for all the different implementations? Jörn On Mon, May 18, 2015 at 3:27 PM, Mondher Bouazizi < mondher.bouaz...@gmail.com> wrote: > Dear all, > > Sorry if you received multiple copies of this email (The links were > embedded). Here are the actual links: > > *Figure:* > > https://drive.google.com/file/d/0B7ON7bq1zRm3Sm1YYktJTVctLWs/view?usp=sharing > *Semeval/senseval results summary:* > > https://docs.google.com/spreadsheets/d/1NCiwXBQs0rxUwtZ3tiwx9FZ4WELIfNCkMKp8rlnKObY/edit?usp=sharing > *Literature survey of WSD techniques:* > > https://docs.google.com/spreadsheets/d/1WQbJNeaKjoT48iS_7oR8ifZlrd4CfhU1Tay_LLPtlCM/edit?usp=sharing > > Yours faithfully > > On Mon, May 18, 2015 at 10:17 PM, Anthony Beylerian < > anthonybeyler...@hotmail.com> wrote: > > > Please excuse the duplicate email, we could not attach the mentioned > > figure. > > Kindly find it here. > > Thank you. > > > > From: anthonybeyler...@hotmail.com > > To: dev@opennlp.apache.org > > Subject: GSoC 2015 - WSD Module > > Date: Mon, 18 May 2015 22:14:43 +0900 > > > > > > > > > > Dear all, > > In the context of building a Word Sense Disambiguation (WSD) module, > after > > doing a survey on WSD techniques, we realized the following points : > > - WSD techniques can be split into three sets (supervised, > > unsupervised/knowledge based, hybrid) - WSD is used for different > directly > > related objectives such as all-words disambiguation, lexical sample > > disambiguation, multi/cross-lingual approaches etc.- Senseval/Semeval > seem > > to be good references to compare different techniques for WSD since many > of > > them were tested on the same data (but different one each event).- For > the > > sake of making a first solution, we propose to start with supporting the > > "lexical sample" type of disambiguation, meaning to disambiguate > > single/limited word(s) from an input text. > > Therefore, we have decided to collect information about the different > > techniques in the literature (such as references, performance, > parameters > > etc.) in this spreadsheet here.Otherwise we have also collected the > results > > of all the senseval/semeval exercises here.(Note that each document has > > many sheets)The collected results, could help decide on which techniques > to > > start with as main models for each set of techniques > > (supervised/unsupervised). > > We also propose a general approach for the package in the figure > > attached.The main components are as follows : > > 1- The different resources publicly available : WordNet, BabelNet, > > Wikipedia, etc.However, we would also like to allow the users to use > their > > own local resources, by maybe defining a type of connector to the > resource > > interface. > > 2- The resource interface will have the role to provide both a sense > > inventory that the user can query and a knowledge base (such as semantic > or > > syntactic info. etc.) that might be used depending on the technique.We > > might even later consider building a local cache for remote services. > > 3- The WSD algorithms/techniques themselves that will make use of the > > resource interface to access the resources required.These techniques will > > be split into two main packages as in the left side of the figure : > > Supervised/Unsupervised.The utils package includes common tools used in > > both types of techniques.The details mentioned in each package should be > > common to all implementations of these abstract models. > > 4- I/O could be processed in different formats (XML/JSON etc) or a > simpler > > structure following your recommendations. > > If you have any suggestions or recommendations, we would really > appreciate > > discussing them and would like your guidance to iterate on this tool-set. > > Best regards, > > > > Anthony Beylerian, Mondher Bouazizi > > >