RE: Scaling cTakes

2014-12-09 Thread Finan, Sean
Hi Brandon,

You are welcome.  I was hoping that you'd get the note processing time down to 
under a second with the different lookup, but I guess not.  I think that any 
optimization from here really depends upon what information you want to extract 
from the notes.

Sean

From: Geise, Brandon D. [bdge...@geisinger.edu]
Sent: Tuesday, December 09, 2014 9:13 AM
To: dev@ctakes.apache.org
Subject: RE: Scaling cTakes

Thanks again Sean for the advice.  Just by changing the pipeline to use the 
fast dictionary led to quadrupling the processing speed.  Any other suggestions 
on performance tuning would be great!

Thanks,
Brandon

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Friday, December 05, 2014 1:14 PM
To: dev@ctakes.apache.org
Subject: RE: Scaling cTakes

Hi Brandon,

It sounds like you've got  a decent pipeline set up.  To increase the speed you 
could try swapping out use of ctakes-dictionary-lookup with 
ctakes-dictionary-lookup-fast in the AE.  Check 
ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for 
an example.  As for the CASPool, I don't think that it will make any difference 
for cTakes.

Sean

From: Geise, Brandon D. [bdge...@geisinger.edu]
Sent: Friday, December 05, 2014 12:40 PM
To: dev@ctakes.apache.org
Subject: Scaling cTakes

Hi,

I'm new to cTakes and the UIMA framework.  I've read most of the UIMA 
documentation and was able to take the BagofCUIGenerator example and modify to 
read notes from a DB, process using the UMLS AE in the clinical-pipeline using 
a local DB version of UMLS, and output the CUIs to a DB.  However, the problem 
I'm having is it's extremely slow; ~3.5-4 notes a minute.  I was hoping I could 
get some hints or advice on speeding the process up.  I read there's a patch 
for LVG, but wasn't quite sure how to implement.  Also from testing using the 
CPE GUI, I don't notice any different in processing time by adjusting the 
CASPool setting.  Some advice on the CASPool would be appreciated also.

Thanks,
Brandon


IMPORTANT WARNING: The information in this message (and the documents attached 
to it, if any) is confidential and may be legally privileged. It is intended 
solely for the addressee. Access to this message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken, or omitted to be taken, in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please delete all electronic copies of this message (and the documents attached 
to it, if any), destroy any hard copies you may have created and notify me 
immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected 
Health Information and other confidential data contained in external e-mail 
messages. If email is encrypted, the recipient will receive an e-mail 
instructing them to sign on to the Geisinger Health System Secure E-mail 
Message Center to retrieve the encrypted e-mail.



RE: Scaling cTakes

2014-12-05 Thread Finan, Sean
Hi Brandon,

It sounds like you've got  a decent pipeline set up.  To increase the speed you 
could try swapping out use of ctakes-dictionary-lookup with 
ctakes-dictionary-lookup-fast in the AE.  Check 
ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for 
an example.  As for the CASPool, I don't think that it will make any difference 
for cTakes.  

Sean

From: Geise, Brandon D. [bdge...@geisinger.edu]
Sent: Friday, December 05, 2014 12:40 PM
To: dev@ctakes.apache.org
Subject: Scaling cTakes

Hi,

I'm new to cTakes and the UIMA framework.  I've read most of the UIMA 
documentation and was able to take the BagofCUIGenerator example and modify to 
read notes from a DB, process using the UMLS AE in the clinical-pipeline using 
a local DB version of UMLS, and output the CUIs to a DB.  However, the problem 
I'm having is it's extremely slow; ~3.5-4 notes a minute.  I was hoping I could 
get some hints or advice on speeding the process up.  I read there's a patch 
for LVG, but wasn't quite sure how to implement.  Also from testing using the 
CPE GUI, I don't notice any different in processing time by adjusting the 
CASPool setting.  Some advice on the CASPool would be appreciated also.

Thanks,
Brandon


IMPORTANT WARNING: The information in this message (and the documents attached 
to it, if any) is confidential and may be legally privileged. It is intended 
solely for the addressee. Access to this message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken, or omitted to be taken, in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please delete all electronic copies of this message (and the documents attached 
to it, if any), destroy any hard copies you may have created and notify me 
immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected 
Health Information and other confidential data contained in external e-mail 
messages. If email is encrypted, the recipient will receive an e-mail 
instructing them to sign on to the Geisinger Health System Secure E-mail 
Message Center to retrieve the encrypted e-mail.


RE: Scaling cTakes

2014-12-05 Thread Savova, Guergana
Hi Brandon,
Our estimate of how long it takes to process a document is under a second with 
the fast dictionary lookup I believe. Sean can provide more details. 
--Guergana

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Friday, December 05, 2014 1:21 PM
To: dev@ctakes.apache.org
Subject: RE: Scaling cTakes

Hi Brandon,

It sounds like you've got  a decent pipeline set up.  To increase the speed you 
could try swapping out use of ctakes-dictionary-lookup with 
ctakes-dictionary-lookup-fast in the AE.  Check 
ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for 
an example.  As for the CASPool, I don't think that it will make any difference 
for cTakes.  

Sean

From: Geise, Brandon D. [bdge...@geisinger.edu]
Sent: Friday, December 05, 2014 12:40 PM
To: dev@ctakes.apache.org
Subject: Scaling cTakes

Hi,

I'm new to cTakes and the UIMA framework.  I've read most of the UIMA 
documentation and was able to take the BagofCUIGenerator example and modify to 
read notes from a DB, process using the UMLS AE in the clinical-pipeline using 
a local DB version of UMLS, and output the CUIs to a DB.  However, the problem 
I'm having is it's extremely slow; ~3.5-4 notes a minute.  I was hoping I could 
get some hints or advice on speeding the process up.  I read there's a patch 
for LVG, but wasn't quite sure how to implement.  Also from testing using the 
CPE GUI, I don't notice any different in processing time by adjusting the 
CASPool setting.  Some advice on the CASPool would be appreciated also.

Thanks,
Brandon


IMPORTANT WARNING: The information in this message (and the documents attached 
to it, if any) is confidential and may be legally privileged. It is intended 
solely for the addressee. Access to this message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken, or omitted to be taken, in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please delete all electronic copies of this message (and the documents attached 
to it, if any), destroy any hard copies you may have created and notify me 
immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected 
Health Information and other confidential data contained in external e-mail 
messages. If email is encrypted, the recipient will receive an e-mail 
instructing them to sign on to the Geisinger Health System Secure E-mail 
Message Center to retrieve the encrypted e-mail.


RE: Scaling cTakes

2014-12-05 Thread Geise, Brandon D.
Thanks Sean.  I'll take a look and see if this speeds the pipeline up.

Thanks,
Brandon

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Friday, December 05, 2014 1:14 PM
To: dev@ctakes.apache.org
Subject: RE: Scaling cTakes

Hi Brandon,

It sounds like you've got  a decent pipeline set up.  To increase the speed you 
could try swapping out use of ctakes-dictionary-lookup with 
ctakes-dictionary-lookup-fast in the AE.  Check 
ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for 
an example.  As for the CASPool, I don't think that it will make any difference 
for cTakes.  

Sean

From: Geise, Brandon D. [bdge...@geisinger.edu]
Sent: Friday, December 05, 2014 12:40 PM
To: dev@ctakes.apache.org
Subject: Scaling cTakes

Hi,

I'm new to cTakes and the UIMA framework.  I've read most of the UIMA 
documentation and was able to take the BagofCUIGenerator example and modify to 
read notes from a DB, process using the UMLS AE in the clinical-pipeline using 
a local DB version of UMLS, and output the CUIs to a DB.  However, the problem 
I'm having is it's extremely slow; ~3.5-4 notes a minute.  I was hoping I could 
get some hints or advice on speeding the process up.  I read there's a patch 
for LVG, but wasn't quite sure how to implement.  Also from testing using the 
CPE GUI, I don't notice any different in processing time by adjusting the 
CASPool setting.  Some advice on the CASPool would be appreciated also.

Thanks,
Brandon


IMPORTANT WARNING: The information in this message (and the documents attached 
to it, if any) is confidential and may be legally privileged. It is intended 
solely for the addressee. Access to this message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken, or omitted to be taken, in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please delete all electronic copies of this message (and the documents attached 
to it, if any), destroy any hard copies you may have created and notify me 
immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected 
Health Information and other confidential data contained in external e-mail 
messages. If email is encrypted, the recipient will receive an e-mail 
instructing them to sign on to the Geisinger Health System Secure E-mail 
Message Center to retrieve the encrypted e-mail.



Re: Scaling cTakes

2014-12-05 Thread jay vyas
on a tangential note, we do have example of running ctakes in a massively
parallel system like spark/hadoop.

https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-spark-streaming-twitter/

if you're problem is embarrasingly parallelizable, you can use
mapreduce/spark to distribute your app using that as a template (spark
streaming can )




On Fri, Dec 5, 2014 at 1:29 PM, Geise, Brandon D. bdge...@geisinger.edu
wrote:

 Thanks Sean.  I'll take a look and see if this speeds the pipeline up.

 Thanks,
 Brandon

 -Original Message-
 From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
 Sent: Friday, December 05, 2014 1:14 PM
 To: dev@ctakes.apache.org
 Subject: RE: Scaling cTakes

 Hi Brandon,

 It sounds like you've got  a decent pipeline set up.  To increase the
 speed you could try swapping out use of ctakes-dictionary-lookup with
 ctakes-dictionary-lookup-fast in the AE.  Check
 ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml
 for an example.  As for the CASPool, I don't think that it will make any
 difference for cTakes.

 Sean
 
 From: Geise, Brandon D. [bdge...@geisinger.edu]
 Sent: Friday, December 05, 2014 12:40 PM
 To: dev@ctakes.apache.org
 Subject: Scaling cTakes

 Hi,

 I'm new to cTakes and the UIMA framework.  I've read most of the UIMA
 documentation and was able to take the BagofCUIGenerator example and modify
 to read notes from a DB, process using the UMLS AE in the clinical-pipeline
 using a local DB version of UMLS, and output the CUIs to a DB.  However,
 the problem I'm having is it's extremely slow; ~3.5-4 notes a minute.  I
 was hoping I could get some hints or advice on speeding the process up.  I
 read there's a patch for LVG, but wasn't quite sure how to implement.  Also
 from testing using the CPE GUI, I don't notice any different in processing
 time by adjusting the CASPool setting.  Some advice on the CASPool would be
 appreciated also.

 Thanks,
 Brandon


 IMPORTANT WARNING: The information in this message (and the documents
 attached to it, if any) is confidential and may be legally privileged. It
 is intended solely for the addressee. Access to this message by anyone else
 is unauthorized. If you are not the intended recipient, any disclosure,
 copying, distribution or any action taken, or omitted to be taken, in
 reliance on it is prohibited and may be unlawful. If you have received this
 message in error, please delete all electronic copies of this message (and
 the documents attached to it, if any), destroy any hard copies you may have
 created and notify me immediately by replying to this email. Thank you.

 Geisinger Health System utilizes an encryption process to safeguard
 Protected Health Information and other confidential data contained in
 external e-mail messages. If email is encrypted, the recipient will receive
 an e-mail instructing them to sign on to the Geisinger Health System Secure
 E-mail Message Center to retrieve the encrypted e-mail.




-- 
jay vyas


RE: Scaling cTakes

2014-12-05 Thread Geise, Brandon D.
Thanks Jay, I'll have to take a look at this too. 

-Original Message-
From: jay vyas [mailto:jayunit100.apa...@gmail.com] 
Sent: Friday, December 05, 2014 2:40 PM
To: dev@ctakes.apache.org
Subject: Re: Scaling cTakes

on a tangential note, we do have example of running ctakes in a massively 
parallel system like spark/hadoop.

https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-spark-streaming-twitter/

if you're problem is embarrasingly parallelizable, you can use mapreduce/spark 
to distribute your app using that as a template (spark streaming can )




On Fri, Dec 5, 2014 at 1:29 PM, Geise, Brandon D. bdge...@geisinger.edu
wrote:

 Thanks Sean.  I'll take a look and see if this speeds the pipeline up.

 Thanks,
 Brandon

 -Original Message-
 From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
 Sent: Friday, December 05, 2014 1:14 PM
 To: dev@ctakes.apache.org
 Subject: RE: Scaling cTakes

 Hi Brandon,

 It sounds like you've got  a decent pipeline set up.  To increase the 
 speed you could try swapping out use of ctakes-dictionary-lookup with 
 ctakes-dictionary-lookup-fast in the AE.  Check 
 ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor
 .xml for an example.  As for the CASPool, I don't think that it will 
 make any difference for cTakes.

 Sean
 
 From: Geise, Brandon D. [bdge...@geisinger.edu]
 Sent: Friday, December 05, 2014 12:40 PM
 To: dev@ctakes.apache.org
 Subject: Scaling cTakes

 Hi,

 I'm new to cTakes and the UIMA framework.  I've read most of the UIMA 
 documentation and was able to take the BagofCUIGenerator example and 
 modify to read notes from a DB, process using the UMLS AE in the 
 clinical-pipeline using a local DB version of UMLS, and output the 
 CUIs to a DB.  However, the problem I'm having is it's extremely slow; 
 ~3.5-4 notes a minute.  I was hoping I could get some hints or advice 
 on speeding the process up.  I read there's a patch for LVG, but 
 wasn't quite sure how to implement.  Also from testing using the CPE 
 GUI, I don't notice any different in processing time by adjusting the 
 CASPool setting.  Some advice on the CASPool would be appreciated also.

 Thanks,
 Brandon


 IMPORTANT WARNING: The information in this message (and the documents 
 attached to it, if any) is confidential and may be legally privileged. 
 It is intended solely for the addressee. Access to this message by 
 anyone else is unauthorized. If you are not the intended recipient, 
 any disclosure, copying, distribution or any action taken, or omitted 
 to be taken, in reliance on it is prohibited and may be unlawful. If 
 you have received this message in error, please delete all electronic 
 copies of this message (and the documents attached to it, if any), 
 destroy any hard copies you may have created and notify me immediately by 
 replying to this email. Thank you.

 Geisinger Health System utilizes an encryption process to safeguard 
 Protected Health Information and other confidential data contained in 
 external e-mail messages. If email is encrypted, the recipient will 
 receive an e-mail instructing them to sign on to the Geisinger Health 
 System Secure E-mail Message Center to retrieve the encrypted e-mail.




--
jay vyas