Re: FilteredIterator is very slow

Thomas Ginter Mon, 31 Mar 2014 12:57:34 -0700

Larry,

A faster way to get the list of types that you will skip would be to do the 
following:

FSIndex<TitlePersonHonorificAnnotation> titlePersonHAIndex = 
aJCas.getAnnotationIndex(TitlePersonHonorificAnnotation.type);

Doing this for each type will yield an index that points to just the 
annotations in the CAS of each type you are interested in.  From there you can 
get an iterator reference ( titlePersonHAIndex.iterator() ) and either traverse 
each one separately or else add them to a common Collection such as an 
ArrayList and iterate through that.  You could also take advantage of the fact 
that the default index in UIMA sorts on ascending order on the begin index and 
descending order on the ending index to stop once you have traversed the list 
past the ending index of the dictTerm.  

An important design decision though would be to consider whether the dictTerm 
annotations are much more numerous than the TitlePersonHonorificAnnotation, 
MeasurementAnnotation, and ProgFactorTerm filtering annotation types.  
Generally if the filter types are much more plentiful and the dictTerm type was 
more rare then looking for overlapping filter types will yield fewer iterations 
of your algorithm, however if there are a lot of dictTerm occurrences and only 
a few of the filter types then it may be more efficient to iterate through the 
filter types and eliminate dictTerms that overlap or are covered.  

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu

On Mar 31, 2014, at 11:47 AM, Kline, Larry <larry.kl...@mckesson.com> wrote:

> When I use a filtered FSIterator it's an order of magnitude slower than a 
> non-filtered iterator.  Here's my code:
> 
> Create the iterator:
>       private FSIterator<Annotation> createConstrainedIterator(JCas aJCas) 
> throws CASException {
>              FSIterator<Annotation> it = 
> aJCas.getAnnotationIndex().iterator();
>              FSTypeConstraint constraint = 
> aJCas.getConstraintFactory().createTypeConstraint();
>              constraint.add((new 
> TitlePersonHonorificAnnotation(aJCas)).getType());
>              constraint.add((new MeasurementAnnotation(aJCas)).getType());
>              constraint.add((new ProgFactorTerm(aJCas)).getType());
>              it = aJCas.createFilteredIterator(it, constraint);
>              return it;
>       }
> Use the iterator:
>       public void process(JCas aJCas) throws AnalysisEngineProcessException {
>              ...
> // The following is done in a loop
>                           if (shouldSkip(dictTerm, skipIter))
>                                  continue;
>              ...
>       }
> Here's the method called:
>       private boolean shouldSkip(G2DictTerm dictTerm, FSIterator<Annotation> 
> skipIter) throws CASException {
>              boolean shouldSkip = false;
>              skipIter.moveToFirst();
>              while (skipIter.hasNext()) {
>                     Annotation annotation = skipIter.next();
>                     if (UIMAUtils.annotationsOverlap(dictTerm, annotation)) {
>                           shouldSkip = true;
>                           break;
>                     }
>              }
>              return shouldSkip;
>       }
> 
> If I change the method, createConstrainedIterator(), to this (that is, no 
> constraints):
>       private FSIterator<Annotation> createConstrainedIterator(JCas aJCas) 
> throws CASException {
>              FSIterator<Annotation> it = 
> aJCas.getAnnotationIndex().iterator();
>              return it;
>       }
> 
> It runs literally 10 times faster.  Doing some profiling I see that all of 
> the time is spent in the skipIter.moveToFirst() call.  I also tried creating 
> the filtered iterator each time anew in the shouldSkip() method instead of 
> passing it in, but that has even slightly worse performance.
> 
> Given this performance I suppose I should probably use a non-filtered 
> iterator and just check for the types I'm interested in inside the loop.
> 
> Any other suggestions welcome.
> 
> Thanks,
> Larry Kline
> 
>

Re: FilteredIterator is very slow

Reply via email to