Tommaso - could you take a look?

-Marshall

On 11/20/2014 3:09 PM, Vadym Oliinyk (JIRA) wrote:
> Vadym Oliinyk created UIMA-4115:
> -----------------------------------
>
>              Summary: TikaAnnotator: incorrect order of tags processing
>                  Key: UIMA-4115
>                  URL: https://issues.apache.org/jira/browse/UIMA-4115
>              Project: UIMA
>           Issue Type: Bug
>           Components: addons
>     Affects Versions: 2.3.1Addons
>             Reporter: Vadym Oliinyk
>
>
> org.apache.uima.tika.MarkupAnnotator outputs incorrect content due to bug in 
> org.apache.uima.tika.MarkupHandler. The problem located in the end element 
> event handler: MarkupHandler#endElement method should close opened tags by 
> removing them from the stack (last added tag should be removed first if 
> corresponding end tag found). But in current implementation it removes start 
> elements beginning from the first open element which results in incorrect 
> text spans annotated by the processor.
>
> The fix is trivial:
> in MarkupHandler#endElement replace startedAnnotations.iterator() with 
> startedAnnotations.descendingIterator().
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
>

Reply via email to