On 06/24/2011 06:37 PM, Jörn Kottmann wrote:
> I suggest that there are two classes of types in the type system.
>
> The first class contains annotations which describe the input we
> collect from our annotators and are also suitable to document
> comments and disagreements between annotators.
>
> And the second class of annotations contain standard linguistic
> annotations such as sentences, tokens, entities, chunks, parses,
> etc.
+1
> The idea is that the annotation in the second class can be
> automatically be derived from the annotations in the first class. In
> case the article is not completely labeled the statistic models could
> fill the gap.
You mean we take the user annotations above a certain agreement level
from the first class types to the second class types to get the gold
annotations? For entities this is no problem, but where do we start for
tokens and sentences? I think we intially apply the current OpenNLP
sentence splitter and tokenizer, right?
> For example, we could ask the annotators to label token splits, form
> these token splits we can derive the actual token annotations. For
> english texts the annotation ui could make use of the alpha num
> optimization and only ask the user for questionable token splits.
Ok, so similar to the entities the UI needs to show the token boudaries
as well as functionality to change these. Or do you want this
functionality in a different UI than the named entity one?
> For named entity annotations the user could do BIO style token
> labeling through a special ui, similar to the one in Walter. The BIO
> labels can then be used to compute the name spans.
Until the beginning of this post I thought we use the name spans to
compute the BIO labels not the other way round. But if we show the
tokens as single blocks, then it makes sense to use some sort of
BIO-style annotations.
For example, the user navigates over the tokens with the left and right
arrow keys. If he hits "P" (for "B-PER") then the focus moves to the
next token. Hitting "p" marks it as "I-PER", hitting "P" another time
marks it as a new entity ("B-PER") and hitting "space" marks it as "O",
i.e., removing a previous annotation. The arrow keys don't change the
label. Feels pretty usable in my mind.. :)
Hannes