[ 
https://issues.apache.org/jira/browse/UIMA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647707#comment-13647707
 ] 

Richard Eckart de Castilho commented on UIMA-2434:
--------------------------------------------------

I didn't make any good experiences with removing FSes from indexes while 
iterating over them. UIMA should probably follow the Java collection 
conventions here, which afaik say that an item removed via the iterator itself 
doesn't invalidate it, but any direct modification to the underlying collection 
basically requires to get new iterators.
                
> Feature structure removal from sorted index is very slow
> --------------------------------------------------------
>
>                 Key: UIMA-2434
>                 URL: https://issues.apache.org/jira/browse/UIMA-2434
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.3.1SDK
>            Reporter: Mikhail Sogrin
>            Assignee: Marshall Schor
>             Fix For: 2.4.1SDK
>
>
> Removal of feature structures from sorted indexes (e.g. default index) is 
> very slow. FSIntArrayIndex.remove() method performs two operations: linear 
> search in the array until the given FS is found, followed by the shift of 
> elements to the end of this array by one position to the left.
> If many annotations (millions and more) are being deleted at once, this 
> operation gets very very slow - much slower than adding these annotations in 
> the first place. It seems to require O(N^2) time to remove N annotations.
> One item is the linear search, which can be replaced by the binary search 
> method, which is already implemented in the same class.
> Second, array copy can be done with Java built-in method instead of a custom 
> loop.
> Ideally, a method for bulk removal of a collection of annotations would have 
> been the most efficient, for example a method to remove all annotations of a 
> given type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to