[ 
https://issues.apache.org/jira/browse/UIMA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648345#comment-13648345
 ] 

Mikhail Sogrin commented on UIMA-2434:
--------------------------------------

Like Richard, I had no luck removing annotations via an iterator, so had to 
collect them in an array and then remove annotations in a separate pass without 
using iterators.

So space reclaiming removal is generally more important than non-invalidation 
of iterators, as it provides real benefits to applications.

Two different bulk removal methods can be implemented: (1) removal of all 
annotations of a given type, like Marshall described; and (2) removal of 
annotations present in a given collection, similar to Java 
Collection.removeAll(Collection), which can be made more efficient than the 
current code due to shifting each element of the array at most one time instead 
of possible N times.

And one more thing. In comment 2, Marshall describes that old binary search 
method finds only an element with the same key, but not necessarily with the 
exact value (as far as I remember the key includes 'begin', 'end' and type 
name, but not other features). But internal find(int) method uses this inexact 
binary search, and then is used in several public methods like 
contains(FeatureStructure), which now can return incorrect result (i.e. return 
"true" when key matches on a FS with the same span and type but different 
feature values).
                
> Feature structure removal from sorted index is very slow
> --------------------------------------------------------
>
>                 Key: UIMA-2434
>                 URL: https://issues.apache.org/jira/browse/UIMA-2434
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.3.1SDK
>            Reporter: Mikhail Sogrin
>            Assignee: Marshall Schor
>             Fix For: 2.4.1SDK
>
>
> Removal of feature structures from sorted indexes (e.g. default index) is 
> very slow. FSIntArrayIndex.remove() method performs two operations: linear 
> search in the array until the given FS is found, followed by the shift of 
> elements to the end of this array by one position to the left.
> If many annotations (millions and more) are being deleted at once, this 
> operation gets very very slow - much slower than adding these annotations in 
> the first place. It seems to require O(N^2) time to remove N annotations.
> One item is the linear search, which can be replaced by the binary search 
> method, which is already implemented in the same class.
> Second, array copy can be done with Java built-in method instead of a custom 
> loop.
> Ideally, a method for bulk removal of a collection of annotations would have 
> been the most efficient, for example a method to remove all annotations of a 
> given type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to