Re: GSoC 2015 - WSD Module

Mondher Bouazizi Fri, 22 May 2015 07:20:14 -0700

Hi all,

Thanks Rodrigo for the feedback.
I don't mind starting with IMS implementation as a first supervised
solution.
It seems to a good first step.
As for the SST, I will read more about it and will let you know.


On the other hand, how about the following interface Anthony and myself
prepared based on Jörn's recommendation.
We tried to be as close as possible to the other tools already implemented.

Link :
https://drive.google.com/file/d/0B7ON7bq1zRm3NTI1bGFfc3lZX0U/view?usp=sharing

Best regards,

Mondher, Anthony



On Fri, May 22, 2015 at 9:59 PM, Rodrigo Agerri <rage...@apache.org> wrote:

> Hello Mondher (my response is about supervised WSD),
>
> Thanks for the info, it is quite interesting. Apart from the comment
> by Jörn, which I think is very important if we want to achieve
> something given the time constrains of the GSOC, I have a couple of
> recommendations/comments from my part:
>
> 1. Rather than targeting Lexical Sample task or all words WSD I think
> it could be more operative to choose an approach/algorithm and try to
> implement it in OpenNLP. One of the most (it not the most) popular
> approaches is the "it Makes Sense" (IMS) system
>
> http://www.comp.nus.edu.sg/~nlp/sw/README.txt
> https://www.comp.nus.edu.sg/~nght/pubs/ims.pdf
>
> That I think is achievable in the GSOC time frame.
>
> 2. As an aside, research has been moving towards supersense tagging
> (SST), given the dificulty of WSD.
>
> http://ttic.uchicago.edu/~altun/pubs/CiaAlt_EMNLP06.pdf
>
> As you can see in the above paper, SST is approached as a sequence
> labelling task, rather than classification. This means that we could
> reimplement Ciaramita and Altun (2006) features implementing the
> AdaptiveFeatureGenerators and creating a module structurally similar
> to the NameFinder but for SST.
>
> This has also the advantage of being able to move to datasets that are
> not old Semcor and senseval and using current Tweet datasets and so
> on. See this recent paper on SST on tweets:
>
> http://aclweb.org/anthology/S14-1001
>
> I think that for supervised WSD, we should pursue option 1. or 2. and
> start definining the interface as Jörn has suggested.
>
> Best,
>
> Rodrigo
>
> On Mon, May 18, 2015 at 2:14 PM, Anthony Beylerian
> <anthonybeyler...@hotmail.com> wrote:
> > Dear all,
> >
> > In the context of building a Word Sense Disambiguation (WSD) module,
> after
> > doing a survey on WSD techniques, we realized the following points :
> >
> > - WSD techniques can be split into three sets (supervised,
> > unsupervised/knowledge based, hybrid)
> >
> > - WSD is used for different directly related objectives such as all-words
> > disambiguation, lexical sample disambiguation, multi/cross-lingual
> > approaches etc.
> >
> > - Senseval/Semeval seem to be good references to compare different
> > techniques for WSD since many of them were tested on the same data (but
> > different one each event).
> >
> > - For the sake of making a first solution, we propose to start with
> > supporting the "lexical sample" type of disambiguation, meaning to
> > disambiguate single/limited word(s) from an input text.
> >
> >
> > Therefore, we have decided to collect information about the different
> > techniques in the literature (such as  references, performance,
> parameters
> > etc.) in this spreadsheet here.
> > Otherwise we have also collected the results of all the senseval/semeval
> > exercises here.
> > (Note that each document has many sheets)
> > The collected results, could help decide on which techniques to start
> with
> > as main models for each set of techniques (supervised/unsupervised).
> >
> > We also propose a general approach for the package in the figure
> attached.
> > The main components are as follows :
> >
> > 1- The different resources publicly available : WordNet, BabelNet,
> > Wikipedia, etc.
> > However, we would also like to allow the users to use their own local
> > resources, by maybe defining a type of connector to the resource
> interface.
> >
> > 2- The resource interface will have the role to provide both a sense
> > inventory that the user can query and a knowledge base (such as semantic
> or
> > syntactic info. etc.) that might be used depending on the technique.
> > We might even later consider building a local cache for remote services.
> >
> > 3- The WSD algorithms/techniques themselves that will make use of the
> > resource interface to access the resources required.
> > These techniques will be split into two main packages as in the left
> side of
> > the figure :  Supervised/Unsupervised.
> > The utils package includes common tools used in both types of techniques.
> > The details mentioned in each package should be common to all
> > implementations of these abstract models.
> >
> > 4- I/O could be processed in different formats (XML/JSON etc) or a
> simpler
> > structure following your recommendations.
> >
> > If you have any suggestions or recommendations, we would really
> appreciate
> > discussing them and would like your guidance to iterate on this tool-set.
> >
> > Best regards,
> >
> > Anthony Beylerian, Mondher Bouazizi
>

Re: GSoC 2015 - WSD Module

Reply via email to