I agree with you. WSD should be included in OpenNLP once it has a reasonably good performance. On the other hand, I have seen few libraries or APIs doing WSD and almost none doing it right. That may be indicative of how hard the problem is.
The only promising api I found is Babelfy : http://babelfy.org/about. It uses a graph based model based on their BabelNet Knowledge base in order to predict word senses. I think it's based on this paper: http://www.aclweb.org/anthology/Q14-1019. Any thoughts on this? On Sat, Feb 24, 2018 at 7:49 PM, Anthony Beylerian < anthony.beyler...@gmail.com> wrote: > Hey Cristian, > > We have tried different approaches such as: > > - Lesk (original) [1] > - Most frequent sense from the data (MFS) > - Extended Lesk (with different scoring functions) > - It makes sense (IMS) [2] > - A sense clustering approach (I don't immediately recall the reference) > > Lesk and MFS are meant to be used as baselines for evaluation purpose only. > The extended version of Lesk is an effort to improve the original, through > additional information from semantic relationships. > Although it's not very accurate, it could be useful since it is an > unsupervised method (no need for large training data). > However, there were some caveats, as both approaches need to pre-load > dictionaries as well as score a semantic graph from WordNet at runtime. > > IMS is a supervised method which we were hoping to mainly use, since it > scored around 80% accuracy on SemEval, however that is only for the > coarse-grained case. However, in reality words have various degrees of > polysemy, and when tested in the fine-grained case the results were much > lower. > We have also experimented with a simple clustering approach but the > improvements were not considerable as far as I remember. > > I just checked the latest results on Semeval2015 [3] and they look a bit > improved on the fine-grained case ~65% F1. > However, in some particular domains it looks like the accuracy increases, > so it could depend on the use case. > > On the other hand, there could be some more recent studies that could yield > better results, but that would need some more investigation. > > There are also some other issues such as lack of direct multi-lingual > support from WordNet, missing sense definitions etc. > We were also still looking for a better source of sense definitions back > then. > In any case, I believe it would be better to have higher performance before > putting this in the official distribution, however that highly depends on > the team. > Otherwise, different parts of the code just need some simple refactoring as > well. > > Best, > > Anthony > > [1] : M. Lesk, Automatic sense disambiguation using machine readable > dictionaries > [2] : https://www.comp.nus.edu.sg/~nght/pubs/ims.pdf > [3] : http://alt.qcri.org/semeval2015/task13/index.php?id=results > > On Wed, Feb 21, 2018 at 5:26 AM, Cristian Petroaca < > cristian.petro...@gmail.com> wrote: > > > Hi Anthony, > > > > I'd be interested to discuss this further. > > What are the wsd methods used? Any links to papers? > > How does the module perform when being evaluated against Senseval? > > > > How much work do you think it's necessary in order to have a functioning > > WSD module in the context of OpenNLP? > > > > Thanks, > > Cristian > > > > > > > > On Tue, Feb 20, 2018 at 8:09 AM, Anthony Beylerian < > > anthony.beyler...@gmail.com> wrote: > > > >> Hi Cristian, > >> > >> Thank you for your interest. > >> > >> The WSD module is currently experimental, so as far as I am aware there > >> is no timeline for it. > >> > >> You can find the sandboxed version here: > >> https://github.com/apache/opennlp-sandbox/tree/master/opennlp-wsd > >> > >> I personally didn't have the time to revisit this for a while and there > >> are still some details to work out. > >> But if you are really interested, you are welcome to discuss and > >> contribute. > >> I will assist as much as possible. > >> > >> Best, > >> > >> Anthony > >> > >> On Sun, Feb 18, 2018 at 5:52 AM, Cristian Petroaca < > >> cristian.petro...@gmail.com> wrote: > >> > >>> Hi, > >>> > >>> I'm interested in word sense disambiguation (particularly based on > >>> Wordnet). I noticed that the latest OpenNLP version doesn't have any > but > >>> I > >>> remember that a couple of years ago there was somebody working on > >>> implementing it. Why isn't it in the official OpenNLP jar? Is there a > >>> timeline for adding it? > >>> > >>> Thanks, > >>> Cristian > >>> > >> > >> > > >