Re: Word Sense Disambiguator

2015-07-24 Thread Joern Kottmann
It would be nice if you could share instructions on how to run it.
I also would like to give it a try.

Jörn

On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian 
anthonybeyler...@hotmail.com wrote:

 Hello,
 Yes for the moment we are only using WordNet for sense definitions.The
 plan is to complete the package by mid to late August, but if you like you
 can follow up on the progress from the sandbox.
 Best regards,
 Anthony
  Date: Thu, 23 Jul 2015 15:36:57 +0300
  Subject: Word Sense Disambiguator
  From: cristian.petro...@gmail.com
  To: dev@opennlp.apache.org
 
  Hi,
 
  I saw that there are people actively working on a Word Sense
 Disambiguator.
  DO you guys know when will the module be ready to use? Also I assume that
  wordnet is used to define the disambiguated word meaning?
 
  Thanks,
  Cristian




RE: Word Sense Disambiguator

2015-07-24 Thread Anthony Beylerian
Hi,

To try out the ongoing implementations, after checking out the sandbox 
repository please try these steps :
1- Create a resource models directory:

- src
  - test
- resources
  + models

2- Include the following pre-trained models and dictionary in that directory:
You can find those here [1] if you like or pre-train your own models.

{
en-token.bin,
en-pos-maxent.bin,
en-sent.bin,en-ner-person.bin,en-lemmatizer.dict
}

As to train the IMS approach you need to include training data like senseval3 
[2]:
For now, please add these folders :
- src
  - test
- resources
   - supervised
 + raw
 + models
 + dictionary

You can find the data files here [2].

3- We included two examples [LeskTester.java] and [IMSTester.java] that you can 
run directly, or make your own tests.

To run a custom test, minimally you need to have a tokenized text or sentence  
for example for Lesk:

  1 String[] words = Loader.getTokenizer().tokenize(sentence);

Chose the index of the word to disambiguate in the token array.

  2 int wordIndex= 6;

Then just create a WSDisambiguator object for example for Lesk :

 3 Lesk lesk = new Lesk();

And you can call the default disambiguation method 

 4 lesk.disambiguate(words,wordIndex);

You will get an array of strings with the following format : 

Lesk : [Source SenseKey Score]   

To read the sense definitions you can use the method :
[opennlp.tools.disambiguator.Constants.printResults]

For using the variations of Lesk, you will need to create and configure a 
parameters object:
  5 LeskParameters leskParams = new LeskParameters();  6 
leskParams.setLeskType(LeskParameters.LESK_TYPE.LESK_BASIC_CTXT_WIN_BF);
  7 leskParams.setWin_b_size(4);  8 leskParams.setDepth(3); 
 9 lesk.setParams(leskParams);

Typically, IMS should perform better than Lesk, since Lesk is a classic method 
but it usually used as a baseline along with the most frequent sense (MFS).
However, we will be testing and adding more techniques.

In any case, please feel free to ask for more details.

Best,

Anthony

[1] : 
https://drive.google.com/folderview?id=0B67Iu3pf6WucfjdYNGhDc3hkTXd1a3FORnNUYzd3dV9YeWlyMFczeHU0SE1TcWwyU1lhZFUusp=sharing
[2] : 
https://drive.google.com/file/d/0ByL0dmKXzHVfSXA3SVZiMnVfOGc/view?usp=sharing
 Date: Fri, 24 Jul 2015 09:54:02 +0200
 Subject: Re: Word Sense Disambiguator
 From: kottm...@gmail.com
 To: dev@opennlp.apache.org
 
 It would be nice if you could share instructions on how to run it.
 I also would like to give it a try.
 
 Jörn
 
 On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian 
 anthonybeyler...@hotmail.com wrote:
 
  Hello,
  Yes for the moment we are only using WordNet for sense definitions.The
  plan is to complete the package by mid to late August, but if you like you
  can follow up on the progress from the sandbox.
  Best regards,
  Anthony
   Date: Thu, 23 Jul 2015 15:36:57 +0300
   Subject: Word Sense Disambiguator
   From: cristian.petro...@gmail.com
   To: dev@opennlp.apache.org
  
   Hi,
  
   I saw that there are people actively working on a Word Sense
  Disambiguator.
   DO you guys know when will the module be ready to use? Also I assume that
   wordnet is used to define the disambiguated word meaning?
  
   Thanks,
   Cristian