Re: Parser performance bug

2015-03-09 Thread Joern Kottmann
On Fri, 2015-03-06 at 21:07 +0100, Joern Kottmann wrote:
 The parser still uses the old style of setting the beam size via the
 constructor. Due to the changes to move that to the training time it
 doesn't work anymore. The parser has to be changed to set the beam
 size
 during training time instead.


I committed a fix for this under OPENNLP-763. The parser should work now
again like it did in 1.5.3.

Jörn


signature.asc
Description: This is a digitally signed message part


Re: Parser performance bug

2015-03-06 Thread Joern Kottmann
Hello,

made some progress with this. The problem is caused by the handling of
the beam size for the POS Tagger.

One way to set the beam size is to include it in the training params.
This method is the only way which works properly with the redesign of
the ml package.

In 1.6.0 it is possible to specify a user implemented classifier and not
all classifiers are using BeamSearch. The parameter doesn't make sense
without BeamSearch. Therefore all the constructors where the beam size
param can be specified should be deprecated/removed.

Anyway, this way of setting the beam size doesn't work due to various
smaller issues in the code. I fixed that in OPENNLP-762.

The parser still uses the old style of setting the beam size via the
constructor. Due to the changes to move that to the training time it
doesn't work anymore. The parser has to be changed to set the beam size
during training time instead.

Jörn


On Sat, 2015-02-21 at 02:13 -0200, William Colen wrote:
 I might be totally wrong, but I have a feeling that the change is
 in ChunkerModel.java, because I also notice a change in the Chunker tool
 results. It could be somehow related to the changes in the parameters in
 that file. We can't discard the possibility that there was a bug that was
 fixed with the changes.
 
 
 Regards,
 William
 
 2015-02-16 12:17 GMT-02:00 Joern Kottmann kottm...@gmail.com:
 
  Hi all,
 
  the performance of the parser changed a bit. The output of the current
  version in 1.6.0 RC2 is different from the output of the 1.5.3 release.
  Even tough there shouldn't been any difference as far as I can see.
 
  The question of what caused that difference came up and I started to
  bisect it.
 
  Here are my results so far:
  1655561 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (head)
  1591889 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (5/2/14)
  1576093 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f  (3/10/14)
  1574819 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/6/14)
  1574524 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/5/14)
  1574505 - 93c912e100932384465ec740d144a94656f214d3 (3/5/14)
  1573000 - 93c912e100932384465ec740d144a94656f214d3 (2/28/14)
  1569434 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14)
  1569285 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14)
  1554795 - 93c912e100932384465ec740d144a94656f214d3 (1/2/14)
  1463979 - 93c912e100932384465ec740d144a94656f214d3 (1.5.3)
 
  The first column is the svn revision, the second column the hash of the
  output data and in the parenthesis is the date of the revision or the
  version.
 
  The change in the code which caused the difference happened in 1574524.
  I had a quick look there and couldn't see within a few minutes what
  caused the issue. I will probably again use a more systematic approach
  to find the exact change in that commit that causes the difference.
 
  Jörn
 
 
 



signature.asc
Description: This is a digitally signed message part


Re: Parser performance bug

2015-02-20 Thread William Colen
I might be totally wrong, but I have a feeling that the change is
in ChunkerModel.java, because I also notice a change in the Chunker tool
results. It could be somehow related to the changes in the parameters in
that file. We can't discard the possibility that there was a bug that was
fixed with the changes.


Regards,
William

2015-02-16 12:17 GMT-02:00 Joern Kottmann kottm...@gmail.com:

 Hi all,

 the performance of the parser changed a bit. The output of the current
 version in 1.6.0 RC2 is different from the output of the 1.5.3 release.
 Even tough there shouldn't been any difference as far as I can see.

 The question of what caused that difference came up and I started to
 bisect it.

 Here are my results so far:
 1655561 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (head)
 1591889 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (5/2/14)
 1576093 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f  (3/10/14)
 1574819 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/6/14)
 1574524 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/5/14)
 1574505 - 93c912e100932384465ec740d144a94656f214d3 (3/5/14)
 1573000 - 93c912e100932384465ec740d144a94656f214d3 (2/28/14)
 1569434 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14)
 1569285 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14)
 1554795 - 93c912e100932384465ec740d144a94656f214d3 (1/2/14)
 1463979 - 93c912e100932384465ec740d144a94656f214d3 (1.5.3)

 The first column is the svn revision, the second column the hash of the
 output data and in the parenthesis is the date of the revision or the
 version.

 The change in the code which caused the difference happened in 1574524.
 I had a quick look there and couldn't see within a few minutes what
 caused the issue. I will probably again use a more systematic approach
 to find the exact change in that commit that causes the difference.

 Jörn





Parser performance bug

2015-02-16 Thread Joern Kottmann
Hi all,

the performance of the parser changed a bit. The output of the current
version in 1.6.0 RC2 is different from the output of the 1.5.3 release.
Even tough there shouldn't been any difference as far as I can see.

The question of what caused that difference came up and I started to
bisect it.

Here are my results so far:
1655561 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (head)
1591889 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (5/2/14)
1576093 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f  (3/10/14)
1574819 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/6/14)
1574524 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/5/14)
1574505 - 93c912e100932384465ec740d144a94656f214d3 (3/5/14)
1573000 - 93c912e100932384465ec740d144a94656f214d3 (2/28/14)
1569434 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14)
1569285 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14)
1554795 - 93c912e100932384465ec740d144a94656f214d3 (1/2/14)
1463979 - 93c912e100932384465ec740d144a94656f214d3 (1.5.3)

The first column is the svn revision, the second column the hash of the
output data and in the parenthesis is the date of the revision or the
version.

The change in the code which caused the difference happened in 1574524.
I had a quick look there and couldn't see within a few minutes what
caused the issue. I will probably again use a more systematic approach
to find the exact change in that commit that causes the difference.

Jörn