Re: OpenNLP 1.5.3 RC 2 ready for testing

Jason Baldridge Fri, 22 Mar 2013 07:08:33 -0700

You could use the MASC annotations. I have a walk through for converting
the data to formats suitable for Chalk (and compatible with OpenNLP) here:
https://github.com/scalanlp/chalk/wiki/Chalk-command-line-tutorial

There is still some work to be done in terms of how the annotations are
extracted, options to training and so on, but it does serve as a benchmark.

BTW, I've just recently finished integrating Liblinear into Nak (which is
an adaptation of the maxent portion of OpenNLP). I'm still rounding some
things out, but so far it is producing more accurate models that are
trained in less time and without using cutoffs. Here's the code:
https://github.com/scalanlp/nak

It is still mostly Java, but the liblinear adaptors are in Scala. I've kept
things such that liblinear retrofits to the interfaces that were in
opennlp.maxent, though given how well it is working, I'll be stripping
those out and going with liblinear for everything in upcoming versions.

Happy to answer any questions or help out with any of the above if it might
be useful!

-Jason

On Fri, Mar 22, 2013 at 8:08 AM, Jörn Kottmann <[email protected]> wrote:

> On 03/22/2013 01:05 PM, William Colen wrote:
>
>> We could do it with Leipzig corpus, or CONLL. We can prepare the corpus by
>> detokenizing it, and creating documents from it.
>>
>> If it is OK to do it with other language, the AD corpus has paragraph and
>> text annotations, as well as the original sentences (not tokenized).
>>
>
> For English we should be able to use some of the CONLL data, yes, we should
> definitely test with other languages too. Leipzig might be suited for
> sentence detector
> training, but not for tokenizer training, since the data is not tokenized
> as far as I know.
>
> +1 to use AD and CONLL for testing the tokenizer and sentence detector.
>
> Jörn
>

-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: OpenNLP 1.5.3 RC 2 ready for testing

Reply via email to