On Thu, Nov 17, 2011 at 11:48 AM, Jörn Kottmann <[email protected]> wrote:

> On 11/17/11 11:32 AM, Aliaksandr Autayeu wrote:
>
>> We shouldn't replace JWNL with a newer version,
>>> because we currently don't have the ability to train
>>> or evaluate the coref component.
>>>
>>>  +1. Having tests coverage eases many things, refactoring and development
>> included :)
>>
>> This is a big issue for us because that also blocks
>>
>>> other changes and updates to the code itself,
>>> e.g. the cleanups Aliaksandr contributed.
>>>
>>> What we need here is a plan how we can get the coref component
>>> into a state which makes it possible to develop it in a community.
>>>
>>> If we don't find a way to resolve this I think we should move the coref
>>> stuff
>>> to the sandbox and leave it there until we have some training data.
>>>
>>>  In my experience doing things like this is almost equal to deleting the
>> piece of code altogether. On the other side, if there is no developer,
>> actively using and developing this piece, having corpora, tests, etc,
>> others might not have enough incentives.
>>
>
> That is already the situation the developer who wrote doesn't support it
> anymore.
> The only way to get it alive again would be to get the training and
> evaluation running.
> If we have that, it will be possible to continue to work on it, and people
> can start using
> it. The code itself is easy to understand and I have a good idea of how it
> works.
>
> In the current state it really blocks the development of a few things.
>
>
>> Another option would be label enough wikinews data, so we are able to
>> train it.
>>
>> How much exactly is this "enough"? And what's the annotation UI? This also
>> might be a good option to improve the annotation tools. I might be
>> interested in pursuing this option (only if the corpus produced will be
>> under a free license), mainly to learn :) but I would need some help and
>> supervision.
>>
>
> We are discussing to do a wikinews crowd sourcing project to label
> training data for all components in OpenNLP.
>
> I once wrote a proposal to communicate this idea:
> https://cwiki.apache.org/**OPENNLP/opennlp-annotations.**html<https://cwiki.apache.org/OPENNLP/opennlp-annotations.html>
>
> Currently we have a first version of the Corpus Server and plugins
> for the UIMA Cas Editor (an annotation tool) to access articles in the
> Corpus Server and
> an OpenNLP Plugin which can help with doing sentence detection,
> tokenization and NER (could be extended with support for coref).
>
> These tools are all located in the sandbox.
>
> I am currently using them to run a private annotation project, and
> therefore have time to work on them.

I'll get a look at them. I also have my own annotation tools, because I
wasn't happy with what was available out there few years ago and because of
some specifics of the situation which can be exploited to speed up the
annotation, but I would be happy to avoid duplication.

Aliaksandr

Reply via email to