Re: Parser performance bug
On Fri, 2015-03-06 at 21:07 +0100, Joern Kottmann wrote: The parser still uses the old style of setting the beam size via the constructor. Due to the changes to move that to the training time it doesn't work anymore. The parser has to be changed to set the beam size during training time instead. I committed a fix for this under OPENNLP-763. The parser should work now again like it did in 1.5.3. Jörn signature.asc Description: This is a digitally signed message part
Re: Parser performance bug
Hello, made some progress with this. The problem is caused by the handling of the beam size for the POS Tagger. One way to set the beam size is to include it in the training params. This method is the only way which works properly with the redesign of the ml package. In 1.6.0 it is possible to specify a user implemented classifier and not all classifiers are using BeamSearch. The parameter doesn't make sense without BeamSearch. Therefore all the constructors where the beam size param can be specified should be deprecated/removed. Anyway, this way of setting the beam size doesn't work due to various smaller issues in the code. I fixed that in OPENNLP-762. The parser still uses the old style of setting the beam size via the constructor. Due to the changes to move that to the training time it doesn't work anymore. The parser has to be changed to set the beam size during training time instead. Jörn On Sat, 2015-02-21 at 02:13 -0200, William Colen wrote: I might be totally wrong, but I have a feeling that the change is in ChunkerModel.java, because I also notice a change in the Chunker tool results. It could be somehow related to the changes in the parameters in that file. We can't discard the possibility that there was a bug that was fixed with the changes. Regards, William 2015-02-16 12:17 GMT-02:00 Joern Kottmann kottm...@gmail.com: Hi all, the performance of the parser changed a bit. The output of the current version in 1.6.0 RC2 is different from the output of the 1.5.3 release. Even tough there shouldn't been any difference as far as I can see. The question of what caused that difference came up and I started to bisect it. Here are my results so far: 1655561 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (head) 1591889 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (5/2/14) 1576093 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/10/14) 1574819 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/6/14) 1574524 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/5/14) 1574505 - 93c912e100932384465ec740d144a94656f214d3 (3/5/14) 1573000 - 93c912e100932384465ec740d144a94656f214d3 (2/28/14) 1569434 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14) 1569285 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14) 1554795 - 93c912e100932384465ec740d144a94656f214d3 (1/2/14) 1463979 - 93c912e100932384465ec740d144a94656f214d3 (1.5.3) The first column is the svn revision, the second column the hash of the output data and in the parenthesis is the date of the revision or the version. The change in the code which caused the difference happened in 1574524. I had a quick look there and couldn't see within a few minutes what caused the issue. I will probably again use a more systematic approach to find the exact change in that commit that causes the difference. Jörn signature.asc Description: This is a digitally signed message part
Re: Parser performance bug
I might be totally wrong, but I have a feeling that the change is in ChunkerModel.java, because I also notice a change in the Chunker tool results. It could be somehow related to the changes in the parameters in that file. We can't discard the possibility that there was a bug that was fixed with the changes. Regards, William 2015-02-16 12:17 GMT-02:00 Joern Kottmann kottm...@gmail.com: Hi all, the performance of the parser changed a bit. The output of the current version in 1.6.0 RC2 is different from the output of the 1.5.3 release. Even tough there shouldn't been any difference as far as I can see. The question of what caused that difference came up and I started to bisect it. Here are my results so far: 1655561 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (head) 1591889 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (5/2/14) 1576093 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/10/14) 1574819 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/6/14) 1574524 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/5/14) 1574505 - 93c912e100932384465ec740d144a94656f214d3 (3/5/14) 1573000 - 93c912e100932384465ec740d144a94656f214d3 (2/28/14) 1569434 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14) 1569285 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14) 1554795 - 93c912e100932384465ec740d144a94656f214d3 (1/2/14) 1463979 - 93c912e100932384465ec740d144a94656f214d3 (1.5.3) The first column is the svn revision, the second column the hash of the output data and in the parenthesis is the date of the revision or the version. The change in the code which caused the difference happened in 1574524. I had a quick look there and couldn't see within a few minutes what caused the issue. I will probably again use a more systematic approach to find the exact change in that commit that causes the difference. Jörn
Parser performance bug
Hi all, the performance of the parser changed a bit. The output of the current version in 1.6.0 RC2 is different from the output of the 1.5.3 release. Even tough there shouldn't been any difference as far as I can see. The question of what caused that difference came up and I started to bisect it. Here are my results so far: 1655561 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (head) 1591889 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (5/2/14) 1576093 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/10/14) 1574819 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/6/14) 1574524 - 1fe53c0aeaae1eb978dbb83f34b13944f2692b1f (3/5/14) 1574505 - 93c912e100932384465ec740d144a94656f214d3 (3/5/14) 1573000 - 93c912e100932384465ec740d144a94656f214d3 (2/28/14) 1569434 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14) 1569285 - 93c912e100932384465ec740d144a94656f214d3 (2/18/14) 1554795 - 93c912e100932384465ec740d144a94656f214d3 (1/2/14) 1463979 - 93c912e100932384465ec740d144a94656f214d3 (1.5.3) The first column is the svn revision, the second column the hash of the output data and in the parenthesis is the date of the revision or the version. The change in the code which caused the difference happened in 1574524. I had a quick look there and couldn't see within a few minutes what caused the issue. I will probably again use a more systematic approach to find the exact change in that commit that causes the difference. Jörn