Hi Anthony, Thanks. I'd also be happy to help with whatever I can in order to bring this component to trunk as soon as possible.
On Mon, Nov 2, 2015 at 2:10 PM, Anthony Beylerian < [email protected]> wrote: > Hello Cristian, > > Sorry for the late reply, I finally have a copy of a good corpus for > coarse testing (OntoNotes). > I will start working again on the component sometime this week. > > Best, > > Anthony > > > Date: Mon, 12 Oct 2015 15:24:46 +0300 > > Subject: Re: Word Sense Disambiguator > > From: [email protected] > > To: [email protected] > > > > Hi, > > > > Thanks Anthony for the info. > > Does anybody else know when the WSD component will be merged into trunk > and > > possibly cut a release with it? > > > > Thanks > > > > On Sat, Sep 19, 2015 at 9:21 AM, Anthony Beylerian < > > [email protected]> wrote: > > > > > Hey Cristian, > > > > > > Sorry for the late reply, I am currently on summer break but will get > back > > > on it in one-two weeks. > > > > > > Can't really say when there will be a new release. > > > This usually involves other components as well and it might take time > to > > > vote. > > > > > > However, some things to expect for the WSD component: > > > > > > - Support for the different types of classifiers for the supervised > > > approaches (right now only ME based). > > > - Support for augmenting the general domain training with specific > domain > > > information. > > > > > > Best, > > > > > > Anthony > > > > > > > > > On Thu, Sep 17, 2015 at 11:18 PM, Cristian Petroaca < > > > [email protected]> wrote: > > > > > > > Hi Anthony, > > > > > > > > Do you know when will the WSD component be available in an OpenNLP > > > release? > > > > > > > > Thanks, > > > > Cristian > > > > > > > > On Thu, Sep 10, 2015 at 10:32 AM, Cristian Petroaca < > > > > [email protected]> wrote: > > > > > > > > > Yes, that's what I was looking for. > > > > > Thanks Aliaksandr. > > > > > > > > > > On Wed, Sep 9, 2015 at 9:39 PM, Aliaksandr Autayeu < > > > > [email protected] > > > > > > wrote: > > > > > > > > > >> Cristian, the reference you gave basically uses synset offsets - > 1740 > > > is > > > > >> entity, 1930 is physical entity, etc. However, in YAGO they seems > to > > > > have > > > > >> added 100000000 to those offsets. > > > > >> > > > > >> Synset offset is the fastest way to get into WordNet dictionary, > > > because > > > > >> it > > > > >> is a direct file offset. Offset alone is not enough though, you > also > > > > need > > > > >> POS - part of speech. Speed probably is the reason most people > access > > > > >> WordNet this way. However, offset is not the best "key", > especially > > > for > > > > >> indexing, because offsets change as WordNet evolves. SenseKeys > (e.g. > > > > >> bank%1:14:00:: > > > > >> and bank%1:21:01::) should be more suitable for indexing. > > > > >> > > > > >> If you're looking to connect with YAGO above, you might do > something > > > > along > > > > >> the lines of > > > > >> ....getWordBySenseKey(sensekey).getSynset().getOffset and then add > > > > >> 100000000 > > > > >> to get the YAGO ids. > > > > >> > > > > >> Aliaksandr > > > > >> > > > > >> > > > > >> On 9 September 2015 at 09:51, Cristian Petroaca < > > > > >> [email protected] > > > > >> > wrote: > > > > >> > > > > >> > I am looking for the Sense Id of the word. It has this format > here : > > > > >> > > > > > >> > > > > > >> > > > > > > > > http://resources.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoWordnetIds.txt > > > > >> > > > > > >> > > > > > >> > On Tue, Sep 8, 2015 at 6:47 PM, Anthony Beylerian < > > > > >> > [email protected]> wrote: > > > > >> > > > > > >> > > Hi, > > > > >> > > > > > > >> > > Thanks it is still being improved. > > > > >> > > > > > > >> > > I am not sure what you mean by type or database ID. > > > > >> > > Currently the sense source and the sense ID are returned. > > > > >> > > > > > > >> > > For example: > > > > >> > > > > > > >> > > "I went to the bank to deposit money." > > > > >> > > target : bank (index : 4) > > > > >> > > expected output : [WORDNET bank%1:14:00:: 21.6, WORDNET > > > > bank%1:21:01:: > > > > >> > > 5.8,... etc] > > > > >> > > > > > > >> > > Where "bank%1:14:00::" is a SenseKey which you can query > WordNet > > > > with > > > > >> to > > > > >> > > give you a sense definition. > > > > >> > > > > > > >> > > You can do this using the default dictionary : > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > Dictionary.getDefaultResourceInstance().getWordBySenseKey(sensekey).getSynset().getGloss(); > > > > >> > > > > > > >> > > Hope this is what you are looking for, otherwise please > clarify. > > > > >> > > > > > > >> > > Anthony Beylerian > > > > >> > > > > > > >> > > On Tue, Sep 8, 2015 at 5:34 PM, Cristian Petroaca < > > > > >> > > [email protected]> wrote: > > > > >> > > > > > > >> > > > Hi Anthony, > > > > >> > > > > > > > >> > > > I had a chance to test the wsd component. That's great work. > > > > Thanks. > > > > >> > > > One question, is it possible to return the wordnet type (or > > > > database > > > > >> > id) > > > > >> > > of > > > > >> > > > the disambiguated word? > > > > >> > > > > > > > >> > > > Thanks, > > > > >> > > > Cristian > > > > >> > > > > > > > >> > > > On Fri, Jul 24, 2015 at 1:14 PM, Anthony Beylerian < > > > > >> > > > [email protected]> wrote: > > > > >> > > > > > > > >> > > > > Hi, > > > > >> > > > > > > > > >> > > > > To try out the ongoing implementations, after checking > out the > > > > >> > sandbox > > > > >> > > > > repository please try these steps : > > > > >> > > > > 1- Create a resource models directory: > > > > >> > > > > > > > > >> > > > > - src > > > > >> > > > > - test > > > > >> > > > > - resources > > > > >> > > > > + models > > > > >> > > > > > > > > >> > > > > 2- Include the following pre-trained models and > dictionary in > > > > that > > > > >> > > > > directory: > > > > >> > > > > You can find those here [1] if you like or pre-train your > own > > > > >> models. > > > > >> > > > > > > > > >> > > > > { > > > > >> > > > > en-token.bin, > > > > >> > > > > en-pos-maxent.bin, > > > > >> > > > > en-sent.bin,en-ner-person.bin,en-lemmatizer.dict > > > > >> > > > > } > > > > >> > > > > > > > > >> > > > > As to train the IMS approach you need to include training > data > > > > >> like > > > > >> > > > > senseval3 [2]: > > > > >> > > > > For now, please add these folders : > > > > >> > > > > - src > > > > >> > > > > - test > > > > >> > > > > - resources > > > > >> > > > > - supervised > > > > >> > > > > + raw > > > > >> > > > > + models > > > > >> > > > > + dictionary > > > > >> > > > > > > > > >> > > > > You can find the data files here [2]. > > > > >> > > > > > > > > >> > > > > 3- We included two examples [LeskTester.java] and > > > > [IMSTester.java] > > > > >> > that > > > > >> > > > > you can run directly, or make your own tests. > > > > >> > > > > > > > > >> > > > > To run a custom test, minimally you need to have a > tokenized > > > > text > > > > >> or > > > > >> > > > > sentence for example for Lesk: > > > > >> > > > > > > > > >> > > > > 1>> String[] words = > > > > >> > > Loader.getTokenizer().tokenize(sentence); > > > > >> > > > > > > > > >> > > > > Chose the index of the word to disambiguate in the token > > > array. > > > > >> > > > > > > > > >> > > > > 2>> int wordIndex= 6; > > > > >> > > > > > > > > >> > > > > Then just create a WSDisambiguator object for example for > > > Lesk : > > > > >> > > > > > > > > >> > > > > 3>> Lesk lesk = new Lesk(); > > > > >> > > > > > > > > >> > > > > And you can call the default disambiguation method > > > > >> > > > > > > > > >> > > > > 4>> lesk.disambiguate(words,wordIndex); > > > > >> > > > > > > > > >> > > > > You will get an array of strings with the following > format : > > > > >> > > > > > > > > >> > > > > Lesk : [Source SenseKey Score] > > > > >> > > > > > > > > >> > > > > To read the sense definitions you can use the method : > > > > >> > > > > [opennlp.tools.disambiguator.Constants.printResults] > > > > >> > > > > > > > > >> > > > > For using the variations of Lesk, you will need to create > and > > > > >> > > configure a > > > > >> > > > > parameters object: > > > > >> > > > > 5>> LeskParameters leskParams = new > > > LeskParameters(); > > > > >> > > > > 6>> > > > > >> > > > > > > > > >> > > > > > > >> > > > > leskParams.setLeskType(LeskParameters.LESK_TYPE.LESK_BASIC_CTXT_WIN_BF); > > > > >> > > > > 7>> leskParams.setWin_b_size(4); 8>> > > > > >> > > > > leskParams.setDepth(3); 9>> > > > lesk.setParams(leskParams); > > > > >> > > > > > > > > >> > > > > Typically, IMS should perform better than Lesk, since > Lesk is > > > a > > > > >> > classic > > > > >> > > > > method but it usually used as a baseline along with the > most > > > > >> frequent > > > > >> > > > sense > > > > >> > > > > (MFS). > > > > >> > > > > However, we will be testing and adding more techniques. > > > > >> > > > > > > > > >> > > > > In any case, please feel free to ask for more details. > > > > >> > > > > > > > > >> > > > > Best, > > > > >> > > > > > > > > >> > > > > Anthony > > > > >> > > > > > > > > >> > > > > [1] : > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > https://drive.google.com/folderview?id=0B67Iu3pf6WucfjdYNGhDc3hkTXd1a3FORnNUYzd3dV9YeWlyMFczeHU0SE1TcWwyU1lhZFU&usp=sharing > > > > >> > > > > [2] : > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > https://drive.google.com/file/d/0ByL0dmKXzHVfSXA3SVZiMnVfOGc/view?usp=sharing > > > > >> > > > > > Date: Fri, 24 Jul 2015 09:54:02 +0200 > > > > >> > > > > > Subject: Re: Word Sense Disambiguator > > > > >> > > > > > From: [email protected] > > > > >> > > > > > To: [email protected] > > > > >> > > > > > > > > > >> > > > > > It would be nice if you could share instructions on how > to > > > run > > > > >> it. > > > > >> > > > > > I also would like to give it a try. > > > > >> > > > > > > > > > >> > > > > > Jörn > > > > >> > > > > > > > > > >> > > > > > On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian < > > > > >> > > > > > [email protected]> wrote: > > > > >> > > > > > > > > > >> > > > > > > Hello, > > > > >> > > > > > > Yes for the moment we are only using WordNet for sense > > > > >> > > > definitions.The > > > > >> > > > > > > plan is to complete the package by mid to late > August, but > > > > if > > > > >> you > > > > >> > > > like > > > > >> > > > > you > > > > >> > > > > > > can follow up on the progress from the sandbox. > > > > >> > > > > > > Best regards, > > > > >> > > > > > > Anthony > > > > >> > > > > > > > Date: Thu, 23 Jul 2015 15:36:57 +0300 > > > > >> > > > > > > > Subject: Word Sense Disambiguator > > > > >> > > > > > > > From: [email protected] > > > > >> > > > > > > > To: [email protected] > > > > >> > > > > > > > > > > > >> > > > > > > > Hi, > > > > >> > > > > > > > > > > > >> > > > > > > > I saw that there are people actively working on a > Word > > > > Sense > > > > >> > > > > > > Disambiguator. > > > > >> > > > > > > > DO you guys know when will the module be ready to > use? > > > > Also > > > > >> I > > > > >> > > > assume > > > > >> > > > > that > > > > >> > > > > > > > wordnet is used to define the disambiguated word > > > meaning? > > > > >> > > > > > > > > > > > >> > > > > > > > Thanks, > > > > >> > > > > > > > Cristian > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > >
