Hi Dima,
It looks like the UriCollectionReader that you are using never sets a document
id (type DocumentID) in the cas. However, this shouldn't be a problem as each
document will be assigned a unique id "UnknownDocument"{###} where {###} is a
number incremented per new document with an unknown id. The message that you
are seeing is just a warning. The code fetching the documentID and creating a
default are very simple and should not take any real processing time.
The call to get document id is the very first line in
AssertionCleartkAnalysisEngine:
@Override
public void process(JCas jCas) throws AnalysisEngineProcessException
{
String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
So, the slowdown occurring after the warning message leads me to believe that
the problem lies later in the process ...
My suggestion is that you put a breakpoint there and run your pipeline through
a debugger. Optionally, there are a couple of log.debug messages in that
class, so you could change the granularity of your log4j and see if you can
narrow down the problem. Add more debug statements if it helps.
At any rate, I have not seen this problem in other pipelines.
Sean
-----Original Message-----
From: Dligach, Dmitriy [mailto:[email protected]]
Sent: Wednesday, May 24, 2017 10:34 AM
To: cTAKES Developer list
Subject: negation/uncertainty: pipeline runs very slowly
Dear cTAKES developers,
I am observing something strange. As soon as I add at the end of my pipeline
the uncertainty/negation AEs:
aggregateBuilder.add(
PolarityCleartkAnalysisEngine.createAnnotatorDescription() );
aggregateBuilder.add(
UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
the pipeline becomes 10-20 times slower. I just confirmed this again. As soon
as I remove these two AEs at the end of my pipeline, it runs very fast again.
It seems to get stuck often right after it outputs this warning:
WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
If I remove the two AEs, this warning disappears.
The full pipeline is here:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWplbiPO2RYdGSI9E&e=
Any clues?
Thank you very much,
Dima