Hi,
To try out the ongoing implementations, after checking out the sandbox
repository please try these steps :
1- Create a resource models directory:
- src
- test
- resources
+ models
2- Include the following pre-trained models and dictionary in that directory:
You can find those here [1] if you like or pre-train your own models.
{
en-token.bin,
en-pos-maxent.bin,
en-sent.bin,en-ner-person.bin,en-lemmatizer.dict
}
As to train the IMS approach you need to include training data like senseval3
[2]:
For now, please add these folders :
- src
- test
- resources
- supervised
+ raw
+ models
+ dictionary
You can find the data files here [2].
3- We included two examples [LeskTester.java] and [IMSTester.java] that you can
run directly, or make your own tests.
To run a custom test, minimally you need to have a tokenized text or sentence
for example for Lesk:
1 String[] words = Loader.getTokenizer().tokenize(sentence);
Chose the index of the word to disambiguate in the token array.
2 int wordIndex= 6;
Then just create a WSDisambiguator object for example for Lesk :
3 Lesk lesk = new Lesk();
And you can call the default disambiguation method
4 lesk.disambiguate(words,wordIndex);
You will get an array of strings with the following format :
Lesk : [Source SenseKey Score]
To read the sense definitions you can use the method :
[opennlp.tools.disambiguator.Constants.printResults]
For using the variations of Lesk, you will need to create and configure a
parameters object:
5 LeskParameters leskParams = new LeskParameters(); 6
leskParams.setLeskType(LeskParameters.LESK_TYPE.LESK_BASIC_CTXT_WIN_BF);
7 leskParams.setWin_b_size(4); 8 leskParams.setDepth(3);
9 lesk.setParams(leskParams);
Typically, IMS should perform better than Lesk, since Lesk is a classic method
but it usually used as a baseline along with the most frequent sense (MFS).
However, we will be testing and adding more techniques.
In any case, please feel free to ask for more details.
Best,
Anthony
[1] :
https://drive.google.com/folderview?id=0B67Iu3pf6WucfjdYNGhDc3hkTXd1a3FORnNUYzd3dV9YeWlyMFczeHU0SE1TcWwyU1lhZFUusp=sharing
[2] :
https://drive.google.com/file/d/0ByL0dmKXzHVfSXA3SVZiMnVfOGc/view?usp=sharing
Date: Fri, 24 Jul 2015 09:54:02 +0200
Subject: Re: Word Sense Disambiguator
From: kottm...@gmail.com
To: dev@opennlp.apache.org
It would be nice if you could share instructions on how to run it.
I also would like to give it a try.
Jörn
On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian
anthonybeyler...@hotmail.com wrote:
Hello,
Yes for the moment we are only using WordNet for sense definitions.The
plan is to complete the package by mid to late August, but if you like you
can follow up on the progress from the sandbox.
Best regards,
Anthony
Date: Thu, 23 Jul 2015 15:36:57 +0300
Subject: Word Sense Disambiguator
From: cristian.petro...@gmail.com
To: dev@opennlp.apache.org
Hi,
I saw that there are people actively working on a Word Sense
Disambiguator.
DO you guys know when will the module be ready to use? Also I assume that
wordnet is used to define the disambiguated word meaning?
Thanks,
Cristian