Parser performance bug

2015-02-16 Thread Joern Kottmann
Hi all,

the performance of the parser changed a bit. The output of the current
version in 1.6.0 RC2 is different from the output of the 1.5.3 release.
Even tough there shouldn't been any difference as far as I can see.

The question of what caused that difference came up and I started to
bisect it.

Here are my results so far:
1655561 -> 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (head)
1591889 -> 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (5/2/14)
1576093 -> 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f  (3/10/14)
1574819 -> 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/6/14)
1574524 -> 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/5/14)
1574505 -> 93c912e100932384465ec740d144a94656f214d3 (3/5/14)
1573000 -> 93c912e100932384465ec740d144a94656f214d3 (2/28/14)
1569434 -> 93c912e100932384465ec740d144a94656f214d3 (2/18/14)
1569285 -> 93c912e100932384465ec740d144a94656f214d3 (2/18/14)
1554795 -> 93c912e100932384465ec740d144a94656f214d3 (1/2/14)
1463979 -> 93c912e100932384465ec740d144a94656f214d3 (1.5.3)

The first column is the svn revision, the second column the hash of the
output data and in the parenthesis is the date of the revision or the
version.

The change in the code which caused the difference happened in 1574524.
I had a quick look there and couldn't see within a few minutes what
caused the issue. I will probably again use a more systematic approach
to find the exact change in that commit that causes the difference.

Jörn




Re: Word Sense Disambiguation

2015-02-16 Thread Joern Kottmann
On Sat, 2015-02-14 at 11:09 +0100, Aliaksandr Autayeu wrote:
> Since you're perhaps deeper in this that others you seem to be the
> best
> candidate to make a proposal, to check the state of the art algorithms
> and
> devise general enough interface for all or most of them. One way could
> be
> to see what the algorithms typically require, how diverse are sources
> of
> senses (WordNet alone has multiple different interfaces to access it),
> which options do the algorithms take and start somewhere there to see
> that
> the interface is flexible enough to accommodate that diversity, has
> ability
> to do some built-in checks (such as detecting the case of algorithm
> trained
> on one source of senses working with another, or perhaps algorithm
> relying
> on a relation which is missing in the sense source) and be similar to
> the
> rest of OpenNLP. We might even end up with two interfaces (e.g. for
> sense
> provider and for WSD itself).
> 
> What do you think about this way?

Please propose an interface. We will discuss it here on the list.

Jörn



Re: Word Sense Disambiguation

2015-02-16 Thread Aliaksandr Autayeu
Jörn, to avoid ambiguity in case you addressed me to propose a WSD
interface. I'd prefer Anthony to come up with a proposal, because he is
closer to the multiple WSD algorithms that would be nice to include in the
analysis.

Aliaksandr

On 16 February 2015 at 15:19, Joern Kottmann  wrote:

> On Sat, 2015-02-14 at 11:09 +0100, Aliaksandr Autayeu wrote:
> > Since you're perhaps deeper in this that others you seem to be the
> > best
> > candidate to make a proposal, to check the state of the art algorithms
> > and
> > devise general enough interface for all or most of them. One way could
> > be
> > to see what the algorithms typically require, how diverse are sources
> > of
> > senses (WordNet alone has multiple different interfaces to access it),
> > which options do the algorithms take and start somewhere there to see
> > that
> > the interface is flexible enough to accommodate that diversity, has
> > ability
> > to do some built-in checks (such as detecting the case of algorithm
> > trained
> > on one source of senses working with another, or perhaps algorithm
> > relying
> > on a relation which is missing in the sense source) and be similar to
> > the
> > rest of OpenNLP. We might even end up with two interfaces (e.g. for
> > sense
> > provider and for WSD itself).
> >
> > What do you think about this way?
>
> Please propose an interface. We will discuss it here on the list.
>
> Jörn
>
>


Re: Word Sense Disambiguation

2015-02-16 Thread Joern Kottmann
On Mon, 2015-02-16 at 16:29 +0100, Aliaksandr Autayeu wrote:
> Jörn, to avoid ambiguity in case you addressed me to propose a WSD
> interface. I'd prefer Anthony to come up with a proposal, because he is
> closer to the multiple WSD algorithms that would be nice to include in the
> analysis.

Sorry, for being unclear, yes I addressed Anthony. But everybody who has
an opinion is very welcome to join the discussion or propose something.

Jörn