[
https://issues.apache.org/jira/browse/CTAKES-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728741#comment-16728741
]
Sean Finan commented on CTAKES-449:
-----------------------------------
The CleartkExtractor contexts each have select() methods that call
JCasUtil.select.. I think that for each of these features (This engine has 9)
a new Fs iteration is done. In a large file that is going to take a long time.
I don't think that ctakes can really deal with this in a better manner unless
we make our own implementation of CleartkExtractor.Context that takes a window
of types ...
> PolarityCleartkAnalysisEngine slow for large documents
> ------------------------------------------------------
>
> Key: CTAKES-449
> URL: https://issues.apache.org/jira/browse/CTAKES-449
> Project: cTAKES
> Issue Type: Improvement
> Components: ctakes-assertion
> Reporter: Dmitriy Dligach
> Assignee: Sean Finan
> Priority: Critical
> Fix For: 4.0.1
>
>
> As soon as I add at the end of my pipeline the negation AE:
> aggregateBuilder.add(
> PolarityCleartkAnalysisEngine.createAnnotatorDescription() );
> The pipeline becomes 50-100 times slower. This likely has to do with the line:
> List<Sentence> sents = new ArrayList<>(JCasUtil.selectCovering(jCas,
> Sentence.class, entityOrEventMention.getBegin(),
> entityOrEventMention.getEnd()));
> in AssertionCleartkAnalysisEngine. I am running the pipeline on large files
> (i.e. having a large number of sentences). The slowdown is caused by the
> code's obtaining all sentences in a document for each identified annotation.
> The full pipeline is here:
> https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/UmlsLookupPipeline.java
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)