Thanks Jay, I'll have to take a look at this too. -----Original Message----- From: jay vyas [mailto:jayunit100.apa...@gmail.com] Sent: Friday, December 05, 2014 2:40 PM To: dev@ctakes.apache.org Subject: Re: Scaling cTakes
on a tangential note, we do have example of running ctakes in a massively parallel system like spark/hadoop. https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-spark-streaming-twitter/ if you're problem is embarrasingly parallelizable, you can use mapreduce/spark to distribute your app using that as a template (spark streaming can ) On Fri, Dec 5, 2014 at 1:29 PM, Geise, Brandon D. <bdge...@geisinger.edu> wrote: > Thanks Sean. I'll take a look and see if this speeds the pipeline up. > > Thanks, > Brandon > > -----Original Message----- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Friday, December 05, 2014 1:14 PM > To: dev@ctakes.apache.org > Subject: RE: Scaling cTakes > > Hi Brandon, > > It sounds like you've got a decent pipeline set up. To increase the > speed you could try swapping out use of ctakes-dictionary-lookup with > ctakes-dictionary-lookup-fast in the AE. Check > ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor > .xml for an example. As for the CASPool, I don't think that it will > make any difference for cTakes. > > Sean > ________________________________________ > From: Geise, Brandon D. [bdge...@geisinger.edu] > Sent: Friday, December 05, 2014 12:40 PM > To: dev@ctakes.apache.org > Subject: Scaling cTakes > > Hi, > > I'm new to cTakes and the UIMA framework. I've read most of the UIMA > documentation and was able to take the BagofCUIGenerator example and > modify to read notes from a DB, process using the UMLS AE in the > clinical-pipeline using a local DB version of UMLS, and output the > CUIs to a DB. However, the problem I'm having is it's extremely slow; > ~3.5-4 notes a minute. I was hoping I could get some hints or advice > on speeding the process up. I read there's a patch for LVG, but > wasn't quite sure how to implement. Also from testing using the CPE > GUI, I don't notice any different in processing time by adjusting the > CASPool setting. Some advice on the CASPool would be appreciated also. > > Thanks, > Brandon > > > IMPORTANT WARNING: The information in this message (and the documents > attached to it, if any) is confidential and may be legally privileged. > It is intended solely for the addressee. Access to this message by > anyone else is unauthorized. If you are not the intended recipient, > any disclosure, copying, distribution or any action taken, or omitted > to be taken, in reliance on it is prohibited and may be unlawful. If > you have received this message in error, please delete all electronic > copies of this message (and the documents attached to it, if any), > destroy any hard copies you may have created and notify me immediately by > replying to this email. Thank you. > > Geisinger Health System utilizes an encryption process to safeguard > Protected Health Information and other confidential data contained in > external e-mail messages. If email is encrypted, the recipient will > receive an e-mail instructing them to sign on to the Geisinger Health > System Secure E-mail Message Center to retrieve the encrypted e-mail. > > -- jay vyas