2011/6/22 Jörn Kottmann <[email protected]>:
> On 6/22/11 10:45 AM, Olivier Grisel wrote:
>>
>> I wind the UIMA CAS API much more complicated to work with than
>> directly working with token-level concepts with the OpenNLP API (i.e.
>> with arrays of Span). I haven't add a look at the opennlp-uima
>> subproject though: you probably already have tooling and predefined
>> type systems that makes interoperability with CAS instance less of a
>> pain.
>
> If you look at annotation tool they usually always give some flexibility to
> the user
> in terms what kind of annotations they are allowed to add. One thing I
> always see is
> as soon as they allow more complex annotations the tools and code which
> handles to
> annotations gets also complex. Have a look at Wordfreak or Gate.
>
> The CAS might be difficult to use first, but at least it works and is
> very well tested. If we create a custom solution we might end up with
> a similar complexity anyway.
>
> We would need to define a type system, but that is something we need
> to do anyway independent of which way we implement it.
> Maybe we even need to support different type systems for different corpora.
> I guess we start with wikipedia based data, but one day we might want to
> annotate an email or blog corpus.
>
> It is an interesting question how the type system should look, since we need
> to
> track where the annotations come from, and might even want some to be double
> checked,
> or need to annotate the disagreement of annotators.

Point taken.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Reply via email to