Hi Dima,

It looks like the UriCollectionReader that you are using never sets a document 
id (type DocumentID) in the cas.  However, this shouldn't be a problem as each 
document will be assigned a unique id "UnknownDocument"{###} where {###} is a 
number incremented per new document with an unknown id.  The message that you 
are seeing is just a warning.  The code fetching the documentID and creating a 
default are very simple and should not take any real processing time.

The call to get document id is the very first line in 
AssertionCleartkAnalysisEngine:
  @Override
  public void process(JCas jCas) throws AnalysisEngineProcessException
  {
    String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);

So, the slowdown occurring after the warning message leads me to believe that 
the problem lies later in the process ...

My suggestion is that you put a breakpoint there and run your pipeline through 
a debugger.  Optionally, there are a couple of log.debug messages in that 
class, so you could change the granularity of your log4j and see if you can 
narrow down the problem.  Add more debug statements if it helps.

At any rate, I have not seen this problem in other pipelines.

Sean

-----Original Message-----
From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
Sent: Wednesday, May 24, 2017 10:34 AM
To: cTAKES Developer list
Subject: negation/uncertainty: pipeline runs very slowly

Dear cTAKES developers, 

I am observing something strange. As soon as I add at the end of my pipeline 
the uncertainty/negation AEs:

aggregateBuilder.add( 
PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
aggregateBuilder.add( 
UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );

the pipeline becomes 10-20 times slower. I just confirmed this again. As soon 
as I remove these two AEs at the end of my pipeline, it runs very fast again.

It seems to get stuck often right after it outputs this warning:
WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation

If I remove the two AEs, this warning disappears.

The full pipeline is here:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWplbiPO2RYdGSI9E&e=
 

Any clues?

Thank you very much,

Dima



Reply via email to