RE: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Finan, Sean Fri, 30 Jun 2017 11:31:55 -0700

Hey Tim,

I have recently been testing with the "smoker" notes in ctakes-examples-res, 
and using your new sentence detector (Lumpy) has definitely been the way to go 
for those notes.  They have the random cr/lf within sentences.  It is great 
that we have some example notes in ctakes that can show off your work.


Cheers,
Sean

-----Original Message-----
From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
Sent: Friday, June 30, 2017 11:03 AM
To: dev@ctakes.apache.org
Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Hi Tim,

Good point, but I happen to be using the ctakes-core sentence detector.

Dima



> On Jun 23, 2017, at 06:31, Miller, Timothy 
> <timothy.mil...@childrens.harvard.edu> wrote:
> 
> Something I just thought of is that if you are using the new (beta) sentence 
> detector trained on Mimic, it is a bit of a "lumper" rather than a 
> "splitter," meaning it is more likely to miss a sentence break and make 
> longer sentences, sometimes absurdly long if there are no clear cues. I know 
> that will slow down the constituency parser and dependency parser, but not 
> sure why it would only slow down when negation processing is added. So, not a 
> solution but something to keep in mind while debugging, especially if it 
> interacts with Steve and Sean's feedback.
> Tim
> 
> 
> ________________________________________
> From: Dligach, Dmitriy <ddlig...@luc.edu>
> Sent: Wednesday, June 21, 2017 9:18 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy
> Subject: Re: negation/uncertainty: pipeline runs very slowly 
> [EXTERNAL]
> 
> Sean, thanks for your comments. You are right. The slowdown doesn't have 
> anything to do with documentID.
> 
> I am now convinced that the slowdown has to do with the Polarity annotator. 
> The reason you and others haven't seen this in other pipelines is that you've 
> probably been processing relatively small files.
> 
> I am processing MIMIC patient files, which typically have thousands of words. 
> I just tried to process 300 files from the THYME corpus (where the files have 
> hundreds of words) and the slowdown was barely noticeable. When running the 
> same pipeline on the MIMIC files, the slowdown becomes very noticeable.
> 
> 
> Dima
> 
> 
> 
>> On Jun 5, 2017, at 10:42, Finan, Sean <sean.fi...@childrens.harvard.edu> 
>> wrote:
>> 
>> Hi Dima,
>> 
>> It looks like the UriCollectionReader that you are using never sets a 
>> document id (type DocumentID) in the cas.  However, this shouldn't be a 
>> problem as each document will be assigned a unique id "UnknownDocument"{###} 
>> where {###} is a number incremented per new document with an unknown id.  
>> The message that you are seeing is just a warning.  The code fetching the 
>> documentID and creating a default are very simple and should not take any 
>> real processing time.
>> 
>> The call to get document id is the very first line in 
>> AssertionCleartkAnalysisEngine:
>> @Override
>> public void process(JCas jCas) throws AnalysisEngineProcessException 
>> {
>>   String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
>> 
>> So, the slowdown occurring after the warning message leads me to believe 
>> that the problem lies later in the process ...
>> 
>> My suggestion is that you put a breakpoint there and run your pipeline 
>> through a debugger.  Optionally, there are a couple of log.debug messages in 
>> that class, so you could change the granularity of your log4j and see if you 
>> can narrow down the problem.  Add more debug statements if it helps.
>> 
>> At any rate, I have not seen this problem in other pipelines.
>> 
>> Sean
>> 
>> -----Original Message-----
>> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu]
>> Sent: Wednesday, May 24, 2017 10:34 AM
>> To: cTAKES Developer list
>> Subject: negation/uncertainty: pipeline runs very slowly
>> 
>> Dear cTAKES developers,
>> 
>> I am observing something strange. As soon as I add at the end of my pipeline 
>> the uncertainty/negation AEs:
>> 
>> aggregateBuilder.add( 
>> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
>> aggregateBuilder.add( 
>> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
>> 
>> the pipeline becomes 10-20 times slower. I just confirmed this again. As 
>> soon as I remove these two AEs at the end of my pipeline, it runs very fast 
>> again.
>> 
>> It seems to get stuck often right after it outputs this warning:
>> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
>> 
>> If I remove the two AEs, this warning disappears.
>> 
>> The full pipeline is here:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitr
>> iydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_p
>> ipelines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwE
>> W14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQ
>> RgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7
>> PVWplbiPO2RYdGSI9E&e=
>> 
>> Any clues?
>> 
>> Thank you very much,
>> 
>> Dima
>> 
>> 
>> 
>

RE: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Reply via email to