On 05/12/13 10:04, Jens Grivolla wrote:

> I agree that it might make more sense to model our needs more directly
>> instead of trying to squeeze it into the schema we normally use for text
>> processing.  But at the same time I would of course like to avoid having
>> to reimplement many of the things that are already available when using
>> AnnotationBase.
>>
>> For the cross-view indexing issue I was thinking of creating individual
>> views for each modality and then a merged view that just contains a
>> subset of annotations of each view, and on which we would do the
>> cross-modal reasoning.
>>
>> I just looked again at the GaleMultiModalExample (not much there,
>> unfortunately) and saw that e.g. AudioSpan derives from AnnotationBase
>> but still has float values for begin/end.  I would be really interested
>> in learning more about what was done in GALE, but it's hard to find any
>> relevant information...
>>
>
The readme at
http://svn.apache.org/repos/asf/uima/sandbox/trunk/GaleMultiModalExample/README.txtpoints
to two papers with more
details on the GALE multi-modal application.

A portion of the view model was like this:
   Audio view - sofaref to the audio data, which was passed in parallel to
   multiple ASR annotators. Each ASR annotator put it's transcription in
the view,
   where annotations contained ASR engine IDs.

   Transcription Views - a text sofa with transcription for an ASR output.
    Annotations for each word referenced the lexeme annotations in the
    audio view. Multiple MT annotators would receive each transcription
    view and add their translations in the view.

    Translation views - a text sofa with one of the translations, based on
    a combination of ASR engine and MT engine. Annotations in a translation
    view referenced the annotations in a transcription view.

There were more views. The points here are that 1) views were designed to
hold a particular SOFA to be processed by analytics appropriate for that
modality, 2) each derived view had cross references to the annotations
in views they were derived from, and 3) at the end the GUI presenting the
final translation could, for any word(s), show the particular piece of
transcription
it came from, and/or play the associated audio segment.

Eddie

Reply via email to