Re: [VOTE] Release Apache cTAKES 3.2.1 (rc2)

2014-12-09 Thread Miller, Timothy
Yep, sounds like a plan.
+1 from me!
Tim

On 12/05/2014 11:20 AM, Chen, Pei wrote:
 Thanks for testing Tim.
 I don't think AggregatePlaintextProcessor.xml (w/o UMLS) does much actually.  
  We can fix the descriptor xml in the next patch irrespective [1].
 I updated the documentation[2] to say 
 - if you want something simple to test, try the 
 SentencesAndTokensAggregate.xml in the meantime.  
 - default to AggregatePlaintextFastUMLSProcessor.xml 
 - made UMLS 'Recommended' rather than 'Optional'.  Otherwise, you'll just 
 have a dictionary with 5 dummy terms.

 [1] https://issues.apache.org/jira/browse/CTAKES-340
 [2] https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2 



 -Original Message-
 From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
 Sent: Thursday, December 04, 2014 9:34 AM
 To: dev@ctakes.apache.org
 Subject: Re: [VOTE] Release Apache cTAKES 3.2.1 (rc2)

 I downloaded the src and bin and did the following (in xubuntu):

 * Compiled src from command line using mvn compile
 * Ran fast dictionary pipeline in src package (using mvn compile -P runCVD) 
 -- required setting umls user  password in dictionary descriptor
 * Extracted resources into bin release and successfully ran runCVD.sh -- 
 require setting umls user  password in dictionary descriptor

 There was one issue with the non-UMLS pipeline -- was missing a close bracket 
 in the descriptor xml. Added it back in and it worked fine. Is there a way to 
 just patch that file w/o re-doing the whole RC?

 Tim


 On 12/01/2014 03:21 PM, Pei Chen wrote:

 This is a call for a vote on releasing the following candidate (rc2) as 
 Apache cTAKES 3.2.1.

 The major changes include:

 - New optional Temporal component (Time + Event Relationships models now 
 available)

 - Other bug fixes/enhancements from Jira

 I manually downloaded the bin as well as resources and tried the CVD with the 
 AggregatePlaintextFastUMLSProcessor.xml and 
 AggregatePlaintextUMLSProcessor.xml.
 Would be great if folks have time to test/verify especially if you opened any 
 of the Jira's below to ensure the bugs have been fixed/integrated.


 For more detailed information on the changes/release notes, please visit:

 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313621version=12326778

 The release was made using the cTAKES release process documented here:
 http://ctakes.apache.org/ctakes-release-guide.html

 The candidate is available at: 
 https://dist.apache.org/repos/dist/dev/ctakes/ctakes-3.2.1-rc2/apache-ctakes-3.2.1-src.tar.gz
 /.zip

 The tag to be voted on: 
 http://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.2.1-rc2

 The MD5 checksum of the tarball can be found at: 
 https://dist.apache.org/repos/dist/dev/ctakes/ctakes-3.2.1-rc2/apache-ctakes-3.2.1-src.tar.gz.md5

 /.zip.md5

 The signature of the tarball can be found at:
 https://dist.apache.org/repos/dist/dev/ctakes/ctakes-3.2.1-rc2/apache-ctakes-3.2.1-src.tar.gz.asc
 /.zip.asc

 Apache cTAKES' KEYS file, containing the PGP keys used to sign the release:
 https://dist.apache.org/repos/dist/release/ctakes/KEYS

 Please vote on releasing these packages as Apache cTAKES 3.2.1. The vote is 
 open for at least the next 72 hours.

 The vote passes if at least three binding +1 votes are cast.
 [ ] +1 Release the packages as Apache cTAKES 3.2.1 [ ] -1 Do not release the 
 packages because...

 Also, the convenience binary can be found at:
 https://dist.apache.org/repos/dist/dev/ctakes/ctakes-3.2.1-rc2/apache-ctakes-3.2.1-bin.tar.gz.md5
 /.zip

 Thanks!





RE: Scaling cTakes

2014-12-09 Thread Finan, Sean
Hi Brandon,

You are welcome.  I was hoping that you'd get the note processing time down to 
under a second with the different lookup, but I guess not.  I think that any 
optimization from here really depends upon what information you want to extract 
from the notes.

Sean

From: Geise, Brandon D. [bdge...@geisinger.edu]
Sent: Tuesday, December 09, 2014 9:13 AM
To: dev@ctakes.apache.org
Subject: RE: Scaling cTakes

Thanks again Sean for the advice.  Just by changing the pipeline to use the 
fast dictionary led to quadrupling the processing speed.  Any other suggestions 
on performance tuning would be great!

Thanks,
Brandon

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Friday, December 05, 2014 1:14 PM
To: dev@ctakes.apache.org
Subject: RE: Scaling cTakes

Hi Brandon,

It sounds like you've got  a decent pipeline set up.  To increase the speed you 
could try swapping out use of ctakes-dictionary-lookup with 
ctakes-dictionary-lookup-fast in the AE.  Check 
ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for 
an example.  As for the CASPool, I don't think that it will make any difference 
for cTakes.

Sean

From: Geise, Brandon D. [bdge...@geisinger.edu]
Sent: Friday, December 05, 2014 12:40 PM
To: dev@ctakes.apache.org
Subject: Scaling cTakes

Hi,

I'm new to cTakes and the UIMA framework.  I've read most of the UIMA 
documentation and was able to take the BagofCUIGenerator example and modify to 
read notes from a DB, process using the UMLS AE in the clinical-pipeline using 
a local DB version of UMLS, and output the CUIs to a DB.  However, the problem 
I'm having is it's extremely slow; ~3.5-4 notes a minute.  I was hoping I could 
get some hints or advice on speeding the process up.  I read there's a patch 
for LVG, but wasn't quite sure how to implement.  Also from testing using the 
CPE GUI, I don't notice any different in processing time by adjusting the 
CASPool setting.  Some advice on the CASPool would be appreciated also.

Thanks,
Brandon


IMPORTANT WARNING: The information in this message (and the documents attached 
to it, if any) is confidential and may be legally privileged. It is intended 
solely for the addressee. Access to this message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken, or omitted to be taken, in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please delete all electronic copies of this message (and the documents attached 
to it, if any), destroy any hard copies you may have created and notify me 
immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected 
Health Information and other confidential data contained in external e-mail 
messages. If email is encrypted, the recipient will receive an e-mail 
instructing them to sign on to the Geisinger Health System Secure E-mail 
Message Center to retrieve the encrypted e-mail.