[ 
https://issues.apache.org/jira/browse/OPENNLP-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185057#comment-14185057
 ] 

Joern Kottmann commented on OPENNLP-676:
----------------------------------------

Before we used those the iteration code had a O(mn) complexity where m is 
number of sentences, and n the number of tokens. Measurements showed that this 
was really slowing down the processing for very large documents.

The new solution tries to overcome that problem by using two iterators which 
are advanced in locksteps.


> POSTagger UIMA AE broken because of AnnotationComboIterator
> -----------------------------------------------------------
>
>                 Key: OPENNLP-676
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-676
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: POS Tagger, UIMA Integration
>    Affects Versions: tools-1.5.3
>         Environment: Oracle JDK8, Debian Jessie 64b
>            Reporter: Hugo Mougard
>            Assignee: Joern Kottmann
>             Fix For: 1.6.0
>
>
> The AnnotationComboIterator helper class used by the UIMA POSTagger accesses 
> its iterators unsafely.
> The consequence is that the AE breaks even on very simple CASes such as the 
> CAS showcased on this repository (text of 9 letters, 2 sentence annotations 
> and 9 token annotations): 
> https://github.com/m09/postagger-iterator-bug/blob/master/in.xmi
> The repository linked above contains an example program that crashes on my 
> setup. It's fully maven 3 aware so you can normally launch it quite easily.
> Here is a patch that should address the issue: 
> https://raw.githubusercontent.com/m09/postagger-iterator-bug/master/iterator.patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to