[
https://issues.apache.org/jira/browse/CTAKES-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981527#comment-13981527
]
ASF subversion and git services commented on CTAKES-16:
-------------------------------------------------------
Commit 1590127 from [~tmill] in branch 'ctakes/trunk'
[ https://svn.apache.org/r1590127 ]
CTAKES-16: Fix to use UIMAFit select instead of iterator.
> use uimaFIT's selectCovered() instead of UIMA's subiterator
> -----------------------------------------------------------
>
> Key: CTAKES-16
> URL: https://issues.apache.org/jira/browse/CTAKES-16
> Project: cTAKES
> Issue Type: Improvement
> Components: ctakes-assertion, ctakes-chunker,
> ctakes-clinical-pipeline, ctakes-context-tokenizer, ctakes-core,
> ctakes-dependency-parser, ctakes-ne-contexts, ctakes-pos-tagger
> Reporter: Pei Chen
> Priority: Minor
>
> Could not get consistent results from .subiterator when using uimaFIT with
> the cTAKES GUI (which wires the components together dynamically).
> To get all the BaseTokens for a particular sentence, if we use the
> .subiterator, the types has be stored in the FSindexes in a certain order
> otherwise it could just return an empty list. This would require the users
> of annotators to understand the ordering of types and have it preconfigured.
> FSIterator<Annotation> tokensInSentenceIterator =
> jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
> uimaFIT already created a convenience method that seems to do something
> similar which will always return the expected tokens. Does anyone know if
> this was part of the motivation? Is the performance hit (if any) worth the
> ease of use?
> Ex:
> List<BaseToken> tokens = org.uimafit.util.JCasUtil.selectCovered(jCas,
> BaseToken.class, sentence); Another alternative is UIMA's FilteredIterator.
> There are a few places that use subiterator in cTAKES and it's tempting to
> use uimaFIT's JCasUtil.selecteCovered() instead... What do others think?
> Background: This issue surfaced when we use the cTAKES GUI (which uses
> uimaFIT to wire the components together instead of the Aggregate XML
> descriptor).
> --Pei
> On Aug 9, 2012, at 9:18 AM, Chen, Pei wrote:
> To get all the BaseTokens for a particular sentence, if we use the
> .subiterator,
> the types has be stored in the FSindexes in a certain order otherwise it could
> just return an empty list. This would require the users of annotators to
> understand the ordering of types and have it preconfigured.
> FSIterator<Annotation> tokensInSentenceIterator =
> jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
> uimaFIT already created a convenience method that seems to do something
> similar
> which will always return the expected tokens. Does anyone know if this was
> part
> of the motivation?
> Yes, that was exactly the motivation to avoid using subiterators. Our
> experience
> in uimaFIT was that subiterators never did what you wanted them to do.
> Is the performance hit (if any) worth the ease of use?
> I doubt there's a performance hit. Take a look at the source for
> JCasUtil.selectCovered vs. org.apache.uima.cas.impl.Subiterator. If anything,
> selectCovered is probably doing less.
> But of course you could time it and find out for sure.
> Steve
> Full discussion thread could be found here:
> http://markmail.org/search/+list:org.apache.incubator.ctakes-dev#query:%20list%3Aorg.apache.incubator.ctakes-dev+page:1+mid:hcp3rudjelddo2dy+state:results
--
This message was sent by Atlassian JIRA
(v6.2#6252)