Thanks Jay, I'll have to take a look at this too. 

-----Original Message-----
From: jay vyas [mailto:jayunit100.apa...@gmail.com] 
Sent: Friday, December 05, 2014 2:40 PM
To: dev@ctakes.apache.org
Subject: Re: Scaling cTakes

on a tangential note, we do have example of running ctakes in a massively 
parallel system like spark/hadoop.

https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-spark-streaming-twitter/

if you're problem is embarrasingly parallelizable, you can use mapreduce/spark 
to distribute your app using that as a template (spark streaming can )




On Fri, Dec 5, 2014 at 1:29 PM, Geise, Brandon D. <bdge...@geisinger.edu>
wrote:

> Thanks Sean.  I'll take a look and see if this speeds the pipeline up.
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Friday, December 05, 2014 1:14 PM
> To: dev@ctakes.apache.org
> Subject: RE: Scaling cTakes
>
> Hi Brandon,
>
> It sounds like you've got  a decent pipeline set up.  To increase the 
> speed you could try swapping out use of ctakes-dictionary-lookup with 
> ctakes-dictionary-lookup-fast in the AE.  Check 
> ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor
> .xml for an example.  As for the CASPool, I don't think that it will 
> make any difference for cTakes.
>
> Sean
> ________________________________________
> From: Geise, Brandon D. [bdge...@geisinger.edu]
> Sent: Friday, December 05, 2014 12:40 PM
> To: dev@ctakes.apache.org
> Subject: Scaling cTakes
>
> Hi,
>
> I'm new to cTakes and the UIMA framework.  I've read most of the UIMA 
> documentation and was able to take the BagofCUIGenerator example and 
> modify to read notes from a DB, process using the UMLS AE in the 
> clinical-pipeline using a local DB version of UMLS, and output the 
> CUIs to a DB.  However, the problem I'm having is it's extremely slow; 
> ~3.5-4 notes a minute.  I was hoping I could get some hints or advice 
> on speeding the process up.  I read there's a patch for LVG, but 
> wasn't quite sure how to implement.  Also from testing using the CPE 
> GUI, I don't notice any different in processing time by adjusting the 
> CASPool setting.  Some advice on the CASPool would be appreciated also.
>
> Thanks,
> Brandon
>
>
> IMPORTANT WARNING: The information in this message (and the documents 
> attached to it, if any) is confidential and may be legally privileged. 
> It is intended solely for the addressee. Access to this message by 
> anyone else is unauthorized. If you are not the intended recipient, 
> any disclosure, copying, distribution or any action taken, or omitted 
> to be taken, in reliance on it is prohibited and may be unlawful. If 
> you have received this message in error, please delete all electronic 
> copies of this message (and the documents attached to it, if any), 
> destroy any hard copies you may have created and notify me immediately by 
> replying to this email. Thank you.
>
> Geisinger Health System utilizes an encryption process to safeguard 
> Protected Health Information and other confidential data contained in 
> external e-mail messages. If email is encrypted, the recipient will 
> receive an e-mail instructing them to sign on to the Geisinger Health 
> System Secure E-mail Message Center to retrieve the encrypted e-mail.
>
>


--
jay vyas

Reply via email to