Re: OpenNLP 1.5.3 RC 2 ready for testing
On 04/03/2013 02:10 AM, William Colen wrote: Thank you, Jörn. I also had to update the maven-changes-plugin version. The 2.3 was failing to download the issue list. Changing to the latest solved the issue. The date in the NOTICE file still says 2011, that needs to be changed to 2013. Jörn
Re: OpenNLP 1.5.3 RC 2 ready for testing
Thank you, I fixed it. I will start the build of RC3 right now. On Wed, Apr 3, 2013 at 5:01 AM, Jörn Kottmann kottm...@gmail.com wrote: On 04/03/2013 02:10 AM, William Colen wrote: Thank you, Jörn. I also had to update the maven-changes-plugin version. The 2.3 was failing to download the issue list. Changing to the latest solved the issue. The date in the NOTICE file still says 2011, that needs to be changed to 2013. Jörn
Re: OpenNLP 1.5.3 RC 2 ready for testing
Before you build we should either commit OPENNLP-564 or remove it from the issue list. Should I quickly commit the rules file? Jörn On 04/03/2013 01:23 PM, William Colen wrote: Thank you, I fixed it. I will start the build of RC3 right now. On Wed, Apr 3, 2013 at 5:01 AM, Jörn Kottmann kottm...@gmail.com wrote: On 04/03/2013 02:10 AM, William Colen wrote: Thank you, Jörn. I also had to update the maven-changes-plugin version. The 2.3 was failing to download the issue list. Changing to the latest solved the issue. The date in the NOTICE file still says 2011, that needs to be changed to 2013. Jörn
Re: OpenNLP 1.5.3 RC 2 ready for testing
In fact you already fixed the year. Thank you. Yes, I can start the build after it. On Wed, Apr 3, 2013 at 8:26 AM, Jörn Kottmann kottm...@gmail.com wrote: Before you build we should either commit OPENNLP-564 or remove it from the issue list. Should I quickly commit the rules file? Jörn On 04/03/2013 01:23 PM, William Colen wrote: Thank you, I fixed it. I will start the build of RC3 right now. On Wed, Apr 3, 2013 at 5:01 AM, Jörn Kottmann kottm...@gmail.com wrote: On 04/03/2013 02:10 AM, William Colen wrote: Thank you, Jörn. I also had to update the maven-changes-plugin version. The 2.3 was failing to download the issue list. Changing to the latest solved the issue. The date in the NOTICE file still says 2011, that needs to be changed to 2013. Jörn
Re: OpenNLP 1.5.3 RC 2 ready for testing
The test plan shows that the issue list was not generated, the problem is that the version is not matching anymore, the new version id for 1.5.3 in opennlp-distr/pom.xml should be 12319040. See this link: https://issues.apache.org/jira/browse/OPENNLP/fixforversion/12319040 I already updated the pom and committed the change. Jörn On 03/08/2013 03:11 PM, William Colen wrote: Hi all, Our second release candidate is ready for testing. RC1 failed to pass the initial quality check. The RC 2 can be downloaded from here: http://people.apache.org/~colen/releases/opennlp-1.5.3/rc2/ To use it in a maven build set the version for opennlp-tools or opennlp-uima to 1.5.3, and for opennlp-maxent to 3.0.3, and add this URL to your settings.xml file: https://repository.apache.org/content/repositories/orgapacheopennlp-005/ The current test plan can be found here: https://cwiki.apache.org/OPENNLP/testplan153.html Please sign up for tasks in the test plan. The release plan can be found here: https://cwiki.apache.org/OPENNLP/releaseplanandtasks153.html The RC contains quite some changes, please refer to the contained issue list for details. William
Re: Liblinear (was: OpenNLP 1.5.3 RC 2 ready for testing)
I used the Java port. I actually pulled it into nak as nak.liblinear because the model write/read code did it as text files and I needed access to the Model member fields in order to do the serialization how I wanted. Otherwise it remains as is. With a little bit of adaptation, you could provide a Java wrapper in OpenNLP that follows the same pattern as my Scala stuff. You'd just need to make it implement AbstractModel, which shouldn't be too hard. (I have it implement LinearModel, which is just a slight modification of MaxentModel, and I changed all uses of AbstractModel to LinearModel in Chalk [the opennlp.tools portion]). -j On Fri, Mar 22, 2013 at 9:32 AM, Jörn Kottmann kottm...@gmail.com wrote: Sounds interesting, I hope we will find the time to do that in OpenNLP after the 1.5.3 release too. We already discussed this and I think had consensus on making the machine learning pluggable and then offer a few addons for existing libraries. Good to know that liblinear works well, as far as I know its written in C/C++, did you use the Java port of it, or did you wrote a JNI interface? Jörn On 03/22/2013 03:08 PM, Jason Baldridge wrote: BTW, I've just recently finished integrating Liblinear into Nak (which is an adaptation of the maxent portion of OpenNLP). I'm still rounding some things out, but so far it is producing more accurate models that are trained in less time and without using cutoffs. Here's the code: https://github.com/scalanlp/**nak https://github.com/scalanlp/nak It is still mostly Java, but the liblinear adaptors are in Scala. I've kept things such that liblinear retrofits to the interfaces that were in opennlp.maxent, though given how well it is working, I'll be stripping those out and going with liblinear for everything in upcoming versions. Happy to answer any questions or help out with any of the above if it might be useful! -- Jason Baldridge Associate Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
Re: OpenNLP 1.5.3 RC 2 ready for testing
I've finished the testing with the Name Finders. The results where improved slightly. I know in the English models the tagger mistakenly tagged adjacent tokens as being identical when in fact the model had correctly categorized the names. There were 3 sentences that previously had an incorrect adjacent tag in the 1.5.2 release that are now fixed and improved the score slightly for the 1.5.3 release. I'm not sure on the other models how many sentences were affected. This only affected the name finder that was trained to categorize all the names together into one model. Models trained to only find one type of name where not affected by this change, because any adjacently tagged item would be the same type anyway. James
Re: OpenNLP 1.5.3 RC 2 ready for testing
Jorn, Could you run the German data to get the combined values? I have the values for the 1000 iterations for the combined. = Testing All Name Finder [de.testa] Precision: 0.6825576995838063 Recall: 0.37326712187047384 F-Measure: 0.4826110219368647 - Testing All Name Finder [de.testb] Precision: 0.6774332472006891 Recall: 0.4282602777021508 F-Measure: 0.5247706422018349 - = But the runs you did with 1.5.2 where with 100 iterations for the training. I'll try to get the Conll 2002 data by Wednesday for the namefinder testing results. Thanks, James
Re: OpenNLP 1.5.3 RC 2 ready for testing
Hi William, No, I think it will be fine. The problem only lies in data where there is back to back names being tagged in the sentences. The unfixed prior models would invalidly tag them with the wrong type... i.e.: both could be the same type such as person instead of the different types one person and the other maybe miscellaneous. In some of the models; especially the combined Name Finder models that contained all the tags ... were affected most; since, the likelihood of back to back tags is higher. In the English models there were 3 sentences that had improper tags before ... now have the correct tags with the fixes. This improved the scores a bit. It should produce identical models since the problem was with the output tagging and not with the training of the models. James On 3/14/2013 11:00 PM, William Colen wrote: Hi, James, Thank you for the warning. It didn't affect the test with the Leipzig corpus: the output from 1.5.2 and 1.5.3 are identical. Do you think we should better manually check the output? Thank you, William On Thu, Mar 14, 2013 at 12:09 AM, James Kosin james.ko...@gmail.com wrote: Hi all, Note, that we will have some discrepancies in the model performance for some of the tests in the NameFinder models due to OPENNLP-417 that fixes the back-to-back name tags. It should really be limited to the combined name tags; but, could also affect others. James On 3/8/2013 9:11 AM, William Colen wrote: Hi all, Our second release candidate is ready for testing. RC1 failed to pass the initial quality check. The RC 2 can be downloaded from here: http://people.apache.org/~**colen/releases/opennlp-1.5.3/**rc2/http://people.apache.org/~colen/releases/opennlp-1.5.3/rc2/ To use it in a maven build set the version for opennlp-tools or opennlp-uima to 1.5.3, and for opennlp-maxent to 3.0.3, and add this URL to your settings.xml file: https://repository.apache.org/**content/repositories/** orgapacheopennlp-005/https://repository.apache.org/content/repositories/orgapacheopennlp-005/ The current test plan can be found here: https://cwiki.apache.org/**OPENNLP/testplan153.htmlhttps://cwiki.apache.org/OPENNLP/testplan153.html Please sign up for tasks in the test plan. The release plan can be found here: https://cwiki.apache.org/**OPENNLP/**releaseplanandtasks153.htmlhttps://cwiki.apache.org/OPENNLP/releaseplanandtasks153.html The RC contains quite some changes, please refer to the contained issue list for details. William
Re: OpenNLP 1.5.3 RC 2 ready for testing
Hi all, Note, that we will have some discrepancies in the model performance for some of the tests in the NameFinder models due to OPENNLP-417 that fixes the back-to-back name tags. It should really be limited to the combined name tags; but, could also affect others. James On 3/8/2013 9:11 AM, William Colen wrote: Hi all, Our second release candidate is ready for testing. RC1 failed to pass the initial quality check. The RC 2 can be downloaded from here: http://people.apache.org/~colen/releases/opennlp-1.5.3/rc2/ To use it in a maven build set the version for opennlp-tools or opennlp-uima to 1.5.3, and for opennlp-maxent to 3.0.3, and add this URL to your settings.xml file: https://repository.apache.org/content/repositories/orgapacheopennlp-005/ The current test plan can be found here: https://cwiki.apache.org/OPENNLP/testplan153.html Please sign up for tasks in the test plan. The release plan can be found here: https://cwiki.apache.org/OPENNLP/releaseplanandtasks153.html The RC contains quite some changes, please refer to the contained issue list for details. William