RE: Scaling cTakes
Hi Brandon, You are welcome. I was hoping that you'd get the note processing time down to under a second with the different lookup, but I guess not. I think that any optimization from here really depends upon what information you want to extract from the notes. Sean From: Geise, Brandon D. [bdge...@geisinger.edu] Sent: Tuesday, December 09, 2014 9:13 AM To: dev@ctakes.apache.org Subject: RE: Scaling cTakes Thanks again Sean for the advice. Just by changing the pipeline to use the fast dictionary led to quadrupling the processing speed. Any other suggestions on performance tuning would be great! Thanks, Brandon -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Friday, December 05, 2014 1:14 PM To: dev@ctakes.apache.org Subject: RE: Scaling cTakes Hi Brandon, It sounds like you've got a decent pipeline set up. To increase the speed you could try swapping out use of ctakes-dictionary-lookup with ctakes-dictionary-lookup-fast in the AE. Check ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for an example. As for the CASPool, I don't think that it will make any difference for cTakes. Sean From: Geise, Brandon D. [bdge...@geisinger.edu] Sent: Friday, December 05, 2014 12:40 PM To: dev@ctakes.apache.org Subject: Scaling cTakes Hi, I'm new to cTakes and the UIMA framework. I've read most of the UIMA documentation and was able to take the BagofCUIGenerator example and modify to read notes from a DB, process using the UMLS AE in the clinical-pipeline using a local DB version of UMLS, and output the CUIs to a DB. However, the problem I'm having is it's extremely slow; ~3.5-4 notes a minute. I was hoping I could get some hints or advice on speeding the process up. I read there's a patch for LVG, but wasn't quite sure how to implement. Also from testing using the CPE GUI, I don't notice any different in processing time by adjusting the CASPool setting. Some advice on the CASPool would be appreciated also. Thanks, Brandon IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail.
RE: Scaling cTakes
Hi Brandon, It sounds like you've got a decent pipeline set up. To increase the speed you could try swapping out use of ctakes-dictionary-lookup with ctakes-dictionary-lookup-fast in the AE. Check ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for an example. As for the CASPool, I don't think that it will make any difference for cTakes. Sean From: Geise, Brandon D. [bdge...@geisinger.edu] Sent: Friday, December 05, 2014 12:40 PM To: dev@ctakes.apache.org Subject: Scaling cTakes Hi, I'm new to cTakes and the UIMA framework. I've read most of the UIMA documentation and was able to take the BagofCUIGenerator example and modify to read notes from a DB, process using the UMLS AE in the clinical-pipeline using a local DB version of UMLS, and output the CUIs to a DB. However, the problem I'm having is it's extremely slow; ~3.5-4 notes a minute. I was hoping I could get some hints or advice on speeding the process up. I read there's a patch for LVG, but wasn't quite sure how to implement. Also from testing using the CPE GUI, I don't notice any different in processing time by adjusting the CASPool setting. Some advice on the CASPool would be appreciated also. Thanks, Brandon IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail.
RE: Scaling cTakes
Hi Brandon, Our estimate of how long it takes to process a document is under a second with the fast dictionary lookup I believe. Sean can provide more details. --Guergana -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Friday, December 05, 2014 1:21 PM To: dev@ctakes.apache.org Subject: RE: Scaling cTakes Hi Brandon, It sounds like you've got a decent pipeline set up. To increase the speed you could try swapping out use of ctakes-dictionary-lookup with ctakes-dictionary-lookup-fast in the AE. Check ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for an example. As for the CASPool, I don't think that it will make any difference for cTakes. Sean From: Geise, Brandon D. [bdge...@geisinger.edu] Sent: Friday, December 05, 2014 12:40 PM To: dev@ctakes.apache.org Subject: Scaling cTakes Hi, I'm new to cTakes and the UIMA framework. I've read most of the UIMA documentation and was able to take the BagofCUIGenerator example and modify to read notes from a DB, process using the UMLS AE in the clinical-pipeline using a local DB version of UMLS, and output the CUIs to a DB. However, the problem I'm having is it's extremely slow; ~3.5-4 notes a minute. I was hoping I could get some hints or advice on speeding the process up. I read there's a patch for LVG, but wasn't quite sure how to implement. Also from testing using the CPE GUI, I don't notice any different in processing time by adjusting the CASPool setting. Some advice on the CASPool would be appreciated also. Thanks, Brandon IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail.
RE: Scaling cTakes
Thanks Sean. I'll take a look and see if this speeds the pipeline up. Thanks, Brandon -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Friday, December 05, 2014 1:14 PM To: dev@ctakes.apache.org Subject: RE: Scaling cTakes Hi Brandon, It sounds like you've got a decent pipeline set up. To increase the speed you could try swapping out use of ctakes-dictionary-lookup with ctakes-dictionary-lookup-fast in the AE. Check ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for an example. As for the CASPool, I don't think that it will make any difference for cTakes. Sean From: Geise, Brandon D. [bdge...@geisinger.edu] Sent: Friday, December 05, 2014 12:40 PM To: dev@ctakes.apache.org Subject: Scaling cTakes Hi, I'm new to cTakes and the UIMA framework. I've read most of the UIMA documentation and was able to take the BagofCUIGenerator example and modify to read notes from a DB, process using the UMLS AE in the clinical-pipeline using a local DB version of UMLS, and output the CUIs to a DB. However, the problem I'm having is it's extremely slow; ~3.5-4 notes a minute. I was hoping I could get some hints or advice on speeding the process up. I read there's a patch for LVG, but wasn't quite sure how to implement. Also from testing using the CPE GUI, I don't notice any different in processing time by adjusting the CASPool setting. Some advice on the CASPool would be appreciated also. Thanks, Brandon IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail.
Re: Scaling cTakes
on a tangential note, we do have example of running ctakes in a massively parallel system like spark/hadoop. https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-spark-streaming-twitter/ if you're problem is embarrasingly parallelizable, you can use mapreduce/spark to distribute your app using that as a template (spark streaming can ) On Fri, Dec 5, 2014 at 1:29 PM, Geise, Brandon D. bdge...@geisinger.edu wrote: Thanks Sean. I'll take a look and see if this speeds the pipeline up. Thanks, Brandon -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Friday, December 05, 2014 1:14 PM To: dev@ctakes.apache.org Subject: RE: Scaling cTakes Hi Brandon, It sounds like you've got a decent pipeline set up. To increase the speed you could try swapping out use of ctakes-dictionary-lookup with ctakes-dictionary-lookup-fast in the AE. Check ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for an example. As for the CASPool, I don't think that it will make any difference for cTakes. Sean From: Geise, Brandon D. [bdge...@geisinger.edu] Sent: Friday, December 05, 2014 12:40 PM To: dev@ctakes.apache.org Subject: Scaling cTakes Hi, I'm new to cTakes and the UIMA framework. I've read most of the UIMA documentation and was able to take the BagofCUIGenerator example and modify to read notes from a DB, process using the UMLS AE in the clinical-pipeline using a local DB version of UMLS, and output the CUIs to a DB. However, the problem I'm having is it's extremely slow; ~3.5-4 notes a minute. I was hoping I could get some hints or advice on speeding the process up. I read there's a patch for LVG, but wasn't quite sure how to implement. Also from testing using the CPE GUI, I don't notice any different in processing time by adjusting the CASPool setting. Some advice on the CASPool would be appreciated also. Thanks, Brandon IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail. -- jay vyas
RE: Scaling cTakes
Thanks Jay, I'll have to take a look at this too. -Original Message- From: jay vyas [mailto:jayunit100.apa...@gmail.com] Sent: Friday, December 05, 2014 2:40 PM To: dev@ctakes.apache.org Subject: Re: Scaling cTakes on a tangential note, we do have example of running ctakes in a massively parallel system like spark/hadoop. https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-spark-streaming-twitter/ if you're problem is embarrasingly parallelizable, you can use mapreduce/spark to distribute your app using that as a template (spark streaming can ) On Fri, Dec 5, 2014 at 1:29 PM, Geise, Brandon D. bdge...@geisinger.edu wrote: Thanks Sean. I'll take a look and see if this speeds the pipeline up. Thanks, Brandon -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Friday, December 05, 2014 1:14 PM To: dev@ctakes.apache.org Subject: RE: Scaling cTakes Hi Brandon, It sounds like you've got a decent pipeline set up. To increase the speed you could try swapping out use of ctakes-dictionary-lookup with ctakes-dictionary-lookup-fast in the AE. Check ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor .xml for an example. As for the CASPool, I don't think that it will make any difference for cTakes. Sean From: Geise, Brandon D. [bdge...@geisinger.edu] Sent: Friday, December 05, 2014 12:40 PM To: dev@ctakes.apache.org Subject: Scaling cTakes Hi, I'm new to cTakes and the UIMA framework. I've read most of the UIMA documentation and was able to take the BagofCUIGenerator example and modify to read notes from a DB, process using the UMLS AE in the clinical-pipeline using a local DB version of UMLS, and output the CUIs to a DB. However, the problem I'm having is it's extremely slow; ~3.5-4 notes a minute. I was hoping I could get some hints or advice on speeding the process up. I read there's a patch for LVG, but wasn't quite sure how to implement. Also from testing using the CPE GUI, I don't notice any different in processing time by adjusting the CASPool setting. Some advice on the CASPool would be appreciated also. Thanks, Brandon IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail. -- jay vyas