Hello, Thank you for the feedback.
Please use this link to access a quick draft of the interface : https://docs.google.com/document/d/10FfAoavKQfQBAWF-frpfltcIPQg6GFrsoD1XmTuGsHM/edit?pli=1 I believe the previously mentioned link was not allowing for document updates. As for the common interface, since supervised methods rely on classifiers they will need to load/save the training models, so we will need to separate the two, maybe as in the draft. However we could keep a parent class with a common [disambiguate] method that can be used for evaluation tasks and others. Thanks ! Anthony > Date: Fri, 22 May 2015 09:18:39 +0200 > Subject: Re: GSoC 2015 - WSD Module > From: kottm...@gmail.com > To: dev@opennlp.apache.org > > Hello, > > one of the tasks we should start is, is to define the interface for the WSD > component. > > Please have a look at the other components in OpenNLP and try to propose an > interface in a similar style. > Can we use one interface for all the different implementations? > > Jörn > > > On Mon, May 18, 2015 at 3:27 PM, Mondher Bouazizi < > mondher.bouaz...@gmail.com> wrote: > > > Dear all, > > > > Sorry if you received multiple copies of this email (The links were > > embedded). Here are the actual links: > > > > *Figure:* > > > > https://drive.google.com/file/d/0B7ON7bq1zRm3Sm1YYktJTVctLWs/view?usp=sharing > > *Semeval/senseval results summary:* > > > > https://docs.google.com/spreadsheets/d/1NCiwXBQs0rxUwtZ3tiwx9FZ4WELIfNCkMKp8rlnKObY/edit?usp=sharing > > *Literature survey of WSD techniques:* > > > > https://docs.google.com/spreadsheets/d/1WQbJNeaKjoT48iS_7oR8ifZlrd4CfhU1Tay_LLPtlCM/edit?usp=sharing > > > > Yours faithfully > > > > On Mon, May 18, 2015 at 10:17 PM, Anthony Beylerian < > > anthonybeyler...@hotmail.com> wrote: > > > > > Please excuse the duplicate email, we could not attach the mentioned > > > figure. > > > Kindly find it here. > > > Thank you. > > > > > > From: anthonybeyler...@hotmail.com > > > To: dev@opennlp.apache.org > > > Subject: GSoC 2015 - WSD Module > > > Date: Mon, 18 May 2015 22:14:43 +0900 > > > > > > > > > > > > > > > Dear all, > > > In the context of building a Word Sense Disambiguation (WSD) module, > > after > > > doing a survey on WSD techniques, we realized the following points : > > > - WSD techniques can be split into three sets (supervised, > > > unsupervised/knowledge based, hybrid) - WSD is used for different > > directly > > > related objectives such as all-words disambiguation, lexical sample > > > disambiguation, multi/cross-lingual approaches etc.- Senseval/Semeval > > seem > > > to be good references to compare different techniques for WSD since many > > of > > > them were tested on the same data (but different one each event).- For > > the > > > sake of making a first solution, we propose to start with supporting the > > > "lexical sample" type of disambiguation, meaning to disambiguate > > > single/limited word(s) from an input text. > > > Therefore, we have decided to collect information about the different > > > techniques in the literature (such as references, performance, > > parameters > > > etc.) in this spreadsheet here.Otherwise we have also collected the > > results > > > of all the senseval/semeval exercises here.(Note that each document has > > > many sheets)The collected results, could help decide on which techniques > > to > > > start with as main models for each set of techniques > > > (supervised/unsupervised). > > > We also propose a general approach for the package in the figure > > > attached.The main components are as follows : > > > 1- The different resources publicly available : WordNet, BabelNet, > > > Wikipedia, etc.However, we would also like to allow the users to use > > their > > > own local resources, by maybe defining a type of connector to the > > resource > > > interface. > > > 2- The resource interface will have the role to provide both a sense > > > inventory that the user can query and a knowledge base (such as semantic > > or > > > syntactic info. etc.) that might be used depending on the technique.We > > > might even later consider building a local cache for remote services. > > > 3- The WSD algorithms/techniques themselves that will make use of the > > > resource interface to access the resources required.These techniques will > > > be split into two main packages as in the left side of the figure : > > > Supervised/Unsupervised.The utils package includes common tools used in > > > both types of techniques.The details mentioned in each package should be > > > common to all implementations of these abstract models. > > > 4- I/O could be processed in different formats (XML/JSON etc) or a > > simpler > > > structure following your recommendations. > > > If you have any suggestions or recommendations, we would really > > appreciate > > > discussing them and would like your guidance to iterate on this tool-set. > > > Best regards, > > > > > > Anthony Beylerian, Mondher Bouazizi > > > > >