Re: GSoC 2015 - WSD Module

Joern Kottmann Mon, 01 Jun 2015 11:31:06 -0700

Hello,

I had a look at your APIs.


Lets start with the WSDisambiguator. Should that be an interface?

// returns the senses ordered by their score (best one first or only 1
in supervised case)
String[] disambiguate(String inputText,int inputWordposition);

Shouldn't we have a tokenized input? Or is the inputText a token?

If you have resources you could package those into OpenNLP models and
use the existing serialization support. Would that work for you?

I think we should have different implementing classes for different
algorithms rather than grouping that in the Supervised and Unsupervised
classes. And also use the algorithm / approach name as part of the class
name.

As far as I understand you already started to work on this. Should we an
initial code drop into the sandbox, and then work out things from there?
We strongly prefer to have as much as possible source code editing
history in our version control system.

Jörn 

On Sat, 2015-05-23 at 01:44 +0900, Anthony Beylerian wrote:
> Hello,
> 
> Thank you for the feedback.
> 
> Please use this link to access a quick draft of the interface :
> https://docs.google.com/document/d/10FfAoavKQfQBAWF-frpfltcIPQg6GFrsoD1XmTuGsHM/edit?pli=1
> 
> I believe the previously mentioned link was not allowing for document updates.
> 
> As for the common interface, since supervised methods rely on classifiers 
> they will need to load/save the training models, so we will need to separate 
> the two, maybe as in the draft.
> However we could keep a parent class with a common [disambiguate] method that 
> can be used for evaluation tasks and others.
> 
> Thanks !
> 
> Anthony
> 
> 
> 
> > Date: Fri, 22 May 2015 09:18:39 +0200
> > Subject: Re: GSoC 2015 - WSD Module
> > From: kottm...@gmail.com
> > To: dev@opennlp.apache.org
> > 
> > Hello,
> > 
> > one of the tasks we should start is, is to define the interface for the WSD
> > component.
> > 
> > Please have a look at the other components in OpenNLP and try to propose an
> > interface in a similar style.
> > Can we use one interface for all the different implementations?
> > 
> > Jörn
> > 
> > 
> > On Mon, May 18, 2015 at 3:27 PM, Mondher Bouazizi <
> > mondher.bouaz...@gmail.com> wrote:
> > 
> > > Dear all,
> > >
> > > Sorry if you received multiple copies of this email (The links were
> > > embedded). Here are the actual links:
> > >
> > > *Figure:*
> > >
> > > https://drive.google.com/file/d/0B7ON7bq1zRm3Sm1YYktJTVctLWs/view?usp=sharing
> > > *Semeval/senseval results summary:*
> > >
> > > https://docs.google.com/spreadsheets/d/1NCiwXBQs0rxUwtZ3tiwx9FZ4WELIfNCkMKp8rlnKObY/edit?usp=sharing
> > > *Literature survey of WSD techniques:*
> > >
> > > https://docs.google.com/spreadsheets/d/1WQbJNeaKjoT48iS_7oR8ifZlrd4CfhU1Tay_LLPtlCM/edit?usp=sharing
> > >
> > > Yours faithfully
> > >
> > > On Mon, May 18, 2015 at 10:17 PM, Anthony Beylerian <
> > > anthonybeyler...@hotmail.com> wrote:
> > >
> > > > Please excuse the duplicate email, we could not attach the mentioned
> > > > figure.
> > > > Kindly find it here.
> > > > Thank you.
> > > >
> > > > From: anthonybeyler...@hotmail.com
> > > > To: dev@opennlp.apache.org
> > > > Subject: GSoC 2015 - WSD Module
> > > > Date: Mon, 18 May 2015 22:14:43 +0900
> > > >
> > > >
> > > >
> > > >
> > > > Dear all,
> > > > In the context of building a Word Sense Disambiguation (WSD) module,
> > > after
> > > > doing a survey on WSD techniques, we realized the following points :
> > > > - WSD techniques can be split into three sets (supervised,
> > > > unsupervised/knowledge based, hybrid) - WSD is used for different
> > > directly
> > > > related objectives such as all-words disambiguation, lexical sample
> > > > disambiguation, multi/cross-lingual approaches etc.- Senseval/Semeval
> > > seem
> > > > to be good references to compare different techniques for WSD since many
> > > of
> > > > them were tested on the same data (but different one each event).- For
> > > the
> > > > sake of making a first solution, we propose to start with supporting the
> > > > "lexical sample" type of disambiguation, meaning to disambiguate
> > > > single/limited word(s) from an input text.
> > > > Therefore, we have decided to collect information about the different
> > > > techniques in the literature (such as  references, performance,
> > > parameters
> > > > etc.) in this spreadsheet here.Otherwise we have also collected the
> > > results
> > > > of all the senseval/semeval exercises here.(Note that each document has
> > > > many sheets)The collected results, could help decide on which techniques
> > > to
> > > > start with as main models for each set of techniques
> > > > (supervised/unsupervised).
> > > > We also propose a general approach for the package in the figure
> > > > attached.The main components are as follows :
> > > > 1- The different resources publicly available : WordNet, BabelNet,
> > > > Wikipedia, etc.However, we would also like to allow the users to use
> > > their
> > > > own local resources, by maybe defining a type of connector to the
> > > resource
> > > > interface.
> > > > 2- The resource interface will have the role to provide both a sense
> > > > inventory that the user can query and a knowledge base (such as semantic
> > > or
> > > > syntactic info. etc.) that might be used depending on the technique.We
> > > > might even later consider building a local cache for remote services.
> > > > 3- The WSD algorithms/techniques themselves that will make use of the
> > > > resource interface to access the resources required.These techniques 
> > > > will
> > > > be split into two main packages as in the left side of the figure :
> > > > Supervised/Unsupervised.The utils package includes common tools used in
> > > > both types of techniques.The details mentioned in each package should be
> > > > common to all implementations of these abstract models.
> > > > 4- I/O could be processed in different formats (XML/JSON etc) or a
> > > simpler
> > > > structure following your recommendations.
> > > > If you have any suggestions or recommendations, we would really
> > > appreciate
> > > > discussing them and would like your guidance to iterate on this 
> > > > tool-set.
> > > > Best regards,
> > > >
> > > > Anthony Beylerian, Mondher Bouazizi
> > > >
> > >
>

signature.asc
Description: This is a digitally signed message part

Re: GSoC 2015 - WSD Module

Reply via email to