Re: HELP with making analyzing text with AggregatePlaintextUMLSProcessor.xml a script instead of manually clicking lots of files. I’m trying PIPER

2023-11-30 Thread James Masanz
Hi,

Welcome to the cTAKES community.

I can at least help out with the  "ArgumentValidationException: Unexpected
trailing value" error. It indicates that you're missing the *-p* parameter.

To run a piper from a command line using the batch file, you would use -p
like this

bin\runPiperFile.bat  *-p*  C:\Users\dwu\Documents\test\piper\mypiper.piper

The following page lists the other parameters for runPiperFile.bat
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files

Hope this helps,
  James

On Thu, Nov 30, 2023 at 9:25 AM Wu, Dee H. (HSC)  wrote:

>
> Dear community
>
> This is my first email to this group. Hope you can help? What I’d like to
> do is have a one pass approach to doing what I have to click through on
> runctakesCVD.bat *I need *HELP with making analyzing text with
> AggregatePlaintextUMLSProcessor.xml a script instead of manually clicking
> lots of files using piper?]
>
> Here’s* what I do now  (MANUALLY).*
>
>1.
>
>From my base directory: C:\Users\dwu\Documents\apache-ctakes-4.0.0.1
>bin\runctakesCVD.bat
>
> desc\ctakes-clinical-pipeline\desc\analysis_engine\AggregatePlaintextUMLSProcessor.xml
>
> (this loads the engine from command line)
>
>1.
>
>read in the text file (too bad that cant be command line implemented,
>didn’t want to touch the java at this point)
>2.
>
>hit control-R (and/or run -> run
>AggregatePlaintextUMLSProcessor.xml) from the men
>3.
>
>Then I write out the xmi file (see visuals)
>
>
>
>
>
>
>
>
>
> Wish this was Automated
>
>
>
>
>
>1.
>
>Heres what I run
>bin\runPiperFile.bat C:\Users\dwu\Documents\test\piper\mypiper.piper
>
> Ideally it should do what
>
> desc\ctakes-clinical-pipeline\desc\analysis_engine\AggregatePlaintextUMLSProcessor.xml
> ‘
>
>
>
> does
>
> *Here’s my piper:*
>
> //   ***  Piper File  ***
>
> //   Created by DWU
>
> //   on November 19, 2023
>
> //  Text Files Reader
>
> //  Reads document texts from text files specified in a provided list.
>
> #   files  The text files to be loaded
>
> reader
> org.apache.ctakes.core.cr.TextReader 
> files=C:\Users\dwu\Documents\test\input_dir\inputsample.txt
>
>
>
> // Load a simple token processing pipeline from another pipeline file
>
> load DefaultTokenizerPipeline.piper
>
>
>
> // Add non-core annotators
>
> add ContextDependentTokenizerAnnotator
>
> addDescription POSTagger
>
>
>
> // Add Chunkers
>
> load ChunkerSubPipe.piper
>
>
>
> // Default fast dictionary lookup
>
> add DefaultJCasTermAnnotator
>
>
>
> // Add Cleartk Entity Attribute annotators
>
> load AttributeCleartkSubPipe.piper
>
>
>
> //  XMI Writer
>
> //  Writes XMI files with full representation of input text and all
> extracted information.
>
> #   OutputDirectory  Output directory to write xmi files
>
> add org.apache.ctakes.core.cc.XmiWriterCasConsumerCtakes
> OutputDirectory=C:\Users\dwu\Documents\test\output_dir
>
>
>
> *Here’s the output:*
>
> log4j: reset attribute= "false".
>
> log4j: Threshold ="null".
>
> log4j: Retreiving an instance of org.apache.log4j.Logger.
>
> log4j: Setting [ProgressAppender] additivity to [false].
>
> log4j: Level value for ProgressAppender is  [INFO].
>
> log4j: ProgressAppender level set to INFO
>
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>
> log4j: Setting property [conversionPattern] to [%m].
>
> log4j: Adding appender named [noEolAppender] to category
> [ProgressAppender].
>
> log4j: Retreiving an instance of org.apache.log4j.Logger.
>
> log4j: Setting [ProgressDone] additivity to [false].
>
> log4j: Level value for ProgressDone is  [INFO].
>
> log4j: ProgressDone level set to INFO
>
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>
> log4j: Setting property [conversionPattern] to [%m%n].
>
> log4j: Adding appender named [eolAppender] to category [ProgressDone].
>
> log4j: Level value for root is  [INFO].
>
> log4j: root level set to INFO
>
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>
> log4j: Setting property [conversionPattern] to [%d{dd MMM  HH:mm:ss}
> %5p %c{1} - %m%n].
>
> log4j: Adding appender named [consoleAppender] to category [root].
>
> Exception in thread "main"
> com.lexicalscope.jewel.cli.ArgumentValidationException: Unexpected trailing
> value (C:\Users\dwu\Documents\test\piper\mypiper.piper)
>
> at
> com.lexicalscope.jewel.cli.ValidationErrorBuilderImpl.validate(ValidationErrorBuilderImpl.java:64)
>
> at
> com.lexicalscope.jewel.cli.validation.ArgumentValidatorImpl.finishedProcessing(ArgumentValidatorImpl.java:104)
>
> at
> com.lexicalscope.jewel.cli.ArgumentCollectionBuilder.processArguments(ArgumentCollectionBuilder.java:129)
>
> at
> com.lexicalscope.jewel.cli.AbstractCliImpl.parseArguments(AbstractCliImpl.java:42)
>
>   

Re: Re: Re: Regardng Apache cTakes

2018-06-09 Thread James Masanz
I see a java.lang.OutOfMemoryError: GC overhead limit exceeded error. You
could try  changing

-Xms512M -Xmx3g
to
-Xms3g  -Xmx6g



On Sat, Jun 9, 2018 at 4:45 PM, Ankit Bisht 
wrote:

> Hello Parth,
>
> I have already done that. After adding the UMLS credentials also, I am
> getting the error
>
> -Ankit
>
> On Fri, Jun 8, 2018 at 11:00 PM, Parth Natu  wrote:
>
>>
>> Hi Ankit,
>>
>> Please add the following parameters before the "-cp" in the Java command
>> in the .sh file.
>>
>>
>> -Dctakes.umlsuser=[username] -Dctakes.umlspw=[password]
>>
>> Try executing the same and let me know the results
>>
>> Parth
>>
>> On Jun 9, 2018 at 7:29 AM, >
>> wrote:
>>
>> Hello Parth,
>>
>> Please find attached the .sh file. The command is:
>> bin/runctakesCVD.sh  -desc  desc/ctakes-clinical-pipeline/
>> desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml
>> I am following this guide:
>> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.
>> 0+User+Install+Guide
>> -Ankit
>>
>> On Fri, Jun 8, 2018 at 12:29 PM, Parth Natu  wrote:
>>
>>>
>>> Hello Ankit,
>>> Can you attach there edited .sh files and the command that you use to
>>> run it?
>>>
>>> Parth
>>>
>>> On Jun 8, 2018 at 9:50 PM, >
>>> wrote:
>>>
>>> Hello Gundolf,
>>>
>>> I have attached the snapshot of the error which I am getting.
>>> Also check my log files, this might help:
>>>
>>> 10:58:42.492 - 1: org.apache.uima.tools.cvd.MainFrame.handleException(526):
>>> SEVERE: Initialization of annotator class "org.apache.ctakes.dictionary.
>>> lookup2.ae.DefaultJCasTermAnnotator" failed.  (Descriptor:
>>> file:/C:/Users/ankit/Desktop/apache-ctakes-4.0.0/desc/ctakes
>>> -dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml)
>>> org.apache.uima.resource.ResourceInitializationException:
>>> Initialization of annotator class "org.apache.ctakes.dictionary.
>>> lookup2.ae.DefaultJCasTermAnnotator" failed.  (Descriptor:
>>> file:/C:/Users/ankit/Desktop/apache-ctakes-4.0.0/desc/ctakes
>>> -dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml)
>>> at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine
>>> _impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>>> at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine
>>> _impl.initialize(PrimitiveAnalysisEngine_impl.java:170)
>>> at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResou
>>> rce(AnalysisEngineFactory_impl.java:94)
>>> at org.apache.uima.impl.CompositeResourceFactory_impl.produceRe
>>> source(CompositeResourceFactory_impl.java:62)
>>> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
>>> at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFram
>>> ework.java:407)
>>> at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_
>>> impl.java:256)
>>> at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine
>>> _impl.initASB(AggregateAnalysisEngine_impl.java:429)
>>> at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine
>>> _impl.initializeAggregateAnalysisEngine(AggregateAnalysisEng
>>> ine_impl.java:373)
>>> at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine
>>> _impl.initialize(AggregateAnalysisEngine_impl.java:186)
>>> at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResou
>>> rce(AnalysisEngineFactory_impl.java:94)
>>> at org.apache.uima.impl.CompositeResourceFactory_impl.produceRe
>>> source(CompositeResourceFactory_impl.java:62)
>>> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
>>> at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFram
>>> ework.java:371)
>>> at org.apache.uima.tools.cvd.MainFrame.setupAE(MainFrame.java:1484)
>>> at org.apache.uima.tools.cvd.MainFrame.loadAEDescriptor(MainFra
>>> me.java:476)
>>> at org.apache.uima.tools.cvd.CVD.main(CVD.java:164)
>>> Caused by: org.apache.uima.resource.ResourceInitializationException:
>>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
>>> java.util.PropertyResourceBundle, key Could not construct
>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRare
>>> WordDictionary
>>> at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnno
>>> tator.initialize(AbstractJCasTermAnnotator.java:131)
>>> at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine
>>> _impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>>> ... 16 more
>>> Caused by: 
>>> org.apache.uima.analysis_engine.annotator.AnnotatorContextException:
>>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
>>> java.util.PropertyResourceBundle, key Could not construct
>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRare
>>> WordDictionary
>>> at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDe
>>> scriptorParser.parseDictionary(DictionaryDescriptorParser.java:199)
>>> at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDe
>>> scriptorParser.parseDictionaries(DictionaryDescriptorParser.java:156)
>>> at org.apache.ctakes

Re: cTAKES fuzzy string matching in NER tasks

2018-01-25 Thread James Masanz
Someone can correct me but as far as I know there is no fuzzy match in the
sense I think you mean - a misspelling or bad OCR result  (for example 1
replacement character or 1 deleted character).

The closest thing like that that I can think of is the way the LVG
component allows cTAKES to consider lexical variants (run vs runs vs ran),
but that does not include misspellings, that is for valid variations of the
word.

-- James




On Thu, Jan 25, 2018 at 3:22 PM, Schenk, Gundolf 
wrote:

> Hi,
>
>
>
> Me too would like to have a fuzzy matching in cTAKES. Do we have anything
> like it? Or is the typical approach to use a preprocessor to correct for
> misspellings, clean up/standardize interpunction and such?
>
> Would be great to read any thoughts on this.
>
>
>
> Cheers,
>
> Gundolf.
>
>
>
>
>
> *From: *Desteny Child 
> *Reply-To: *"user@ctakes.apache.org" 
> *Date: *Wednesday, January 24, 2018 at 8:10 AM
> *To: *"user@ctakes.apache.org" 
> *Subject: *cTAKES fuzzy string matching in NER tasks
>
>
>
> Hello,
>
> I'd like to use cTAKES 4 for Named-Entity Recognition (NER) tasks. Right
> now I'm just wondering is it possible to configure cTAKES to use fuzzy
> string matching for NER tasks. I need it because very often my documents
> are not of the very well quality and the words inside can be a little bit
> corrupted (for example after the OCR process).
>
> If so, could you please point me to the appropriate cTAKES documentation
> and samples in order to do it?
>
> Thanks in advance,
>
> Mike
>


Re: Error running CAS Visual Debugger [EXTERNAL]

2018-01-19 Thread James Masanz
The UMLS authentication is working for me right now, running from a 4.0
binary install.

Don, is it still not working for you? are you by any chance running from
the binary install or are you running within an IDE, and using the "trunk"
version of the code?
Otherwise I wonder if something in your PATH is not playing nice.

-- James




On Fri, Jan 19, 2018 at 3:01 PM, Schenk, Gundolf 
wrote:

> Hi,
>
> The authentication works for me.
>
> Don, did you try
>
> C:\apache-ctakes-4.0.0>bin\runctakesCVD.bat  -Dctakes.umlsuser=donflinn
> -Dctakes.umlspw="yourpassw@rd"
>
> Cheers,
> Gundolf.
>
> On 1/19/18, 11:57 AM, "Miller, Timothy"  harvard.edu> wrote:
>
> I'm having trouble loading the url in my browser too. Could be that the
> UMLS web service is down. Is there a person at the NLM to contact about
> this?
>
> Tim
>
> On Fri, 2018-01-19 at 14:45 -0500, Don Flinn wrote:
> > I'm new to Ctakes and am attempting to run the example in the Ctakes
> > user guide.  I did the install as instructed.  When I ran
> > the runctakesCVD.bat, the CAS Visusl Debugger window popped up but I
> > got the following error:
> >  ...
> > 19 Jan 2018 08:15:43  INFO UmlsUserApprover - Checking UMLS Account
> > at https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user
> > donflinn:
> >
> > 19 Jan 2018 08:15:44 ERROR UmlsUserApprover -
> > java.security.NoSuchAlgorithmException: Error constructing
> > implementation (algorithm: Default, provider: SunJSSE, class:
> > sun.security.ssl.SSLContextImpl$DefaultSSLContext)
> >
> > I'm assuming this means that my login to UMLS failed.  However, I
> > have a UMLS username and password and entered them into both the Java
> > commands in the batch files and also as environment variables.  I can
> > use these to log into UMLS so I know that they are correct.
> >
> > I'm running Windows 10 and  ctakes 4.0.0.
> >
> > If someone could let me know what I am doing wrong it would be
> > greatly appreciated.
> >
> > The full debug output is below.  Note I also tried to parse the text,
> > which as expected also failed.  I also commented out the echo off in
> > the script.
> > Don.
> >  output
> > =
> > C:\apache-ctakes-4.0.0\bin>cd ..
> >
> > C:\apache-ctakes-4.0.0>bin\runctakesCVD.bat  desc\ctakes-clinical-
> > pipeline\desc\analysis_engine\AggregatePlaintextFastUMLSProc
> essor.xml
> >
> > C:\apache-ctakes-4.0.0>set CURRENT_DIR=C:\apache-ctakes-4.0.0
> >
> > C:\apache-ctakes-4.0.0>if not "C:\apache-ctakes-4.0.0" == "" goto
> > gotHome
> >
> > C:\apache-ctakes-4.0.0>if exist "C:\apache-ctakes-
> > 4.0.0\bin\runctakesCVD.bat" goto okHome
> >
> > C:\apache-ctakes-4.0.0>if exist "C:\Program
> > Files\Java\jdk1.8.0_144\bin\java.exe" set PATH=C:\Program
> > Files\Java\jdk1.8.0_144\bin;C:\ProgramData\Oracle\Java\
> javapath;C:\WI
> > NDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\
> WINDOWS\System3
> > 2\WindowsPowerShell\v1.0\;c:\php;C:\Program Files
> > (x86)\Subversion\bin;C:\Program
> > Files\TortoiseSVN\bin;c:\unixcmds\usr\local\wbin;C:\Program
> > Files\nodejs\;C:\maven\bin;c:\program
> > files\java\jdk1.8.0_144\bin;c:\program
> > files\java\jdk1.8.0_144\lib;C:\Users\don\AppData\Roaming\
> npm;C:\Users
> > \don\AppData\Local\Microsoft\WindowsApps;C:\Program
> > Files\Java\jdk1.8.0_144\bin;c:\openssl\bin;C:\apache-uima\bin;
> >
> > C:\apache-ctakes-4.0.0>cd C:\apache-ctakes-4.0.0
> >
> > C:\apache-ctakes-4.0.0>IF "desc\ctakes-clinical-
> > pipeline\desc\analysis_engine\AggregatePlaintextFastUMLSProc
> essor.xml
> > " == "" GOTO NoParam
> >
> > C:\apache-ctakes-4.0.0>IF NOT "" == "" GOTO MoreThanOneParam
> >
> > C:\apache-ctakes-4.0.0>if exist "C:\apache-ctakes-4.0.0\desc\ctakes-
> > clinical-
> > pipeline\desc\analysis_engine\AggregatePlaintextFastUMLSProc
> essor.xml
> > " (
> > echo CVD will load AE "C:\apache-ctakes-4.0.0\desc\ctakes-clinical-
> > pipeline\desc\analysis_engine\AggregatePlaintextFastUMLSProc
> essor.xml
> > "
> >  java -Dctakes.umlsuser=donflinn -Dctakes.umlspw=threeOne4 -cp
> > "C:\apache-ctakes-4.0.0\desc\;C:\apache-ctakes-
> > 4.0.0\resources\;C:\apache-ctakes-4.0.0\lib\*"
> > -Dlog4j.configuration=file:\C:\apache-ctakes-4.0.0\config\log4j.xml
> > -Xms512M -Xmx3g org.apache.uima.tools.cvd.CVD -desc desc\ctakes-
> > clinical-
> > pipeline\desc\analysis_engine\AggregatePlaintextFastUMLSProc
> essor.xml
> > )  else (
> > echo Unable to find descriptor "desc\ctakes-clinical-
> > pipeline\desc\analysis_engine\AggregatePlaintextFastUMLSProc
> essor.xml
> > "
> >  GOTO NoParam
> > )
> > CVD will load AE "C:\apache-ctakes-4.0.0\desc\ctakes-

Re: Errors when executing AggregatePlaintextFastUMLSProcessor [EXTERNAL]

2018-01-18 Thread James Masanz
I wasn't clear - the names of the directories seem to be confusing things.
The files you mention are in a subdirectory called tables.
The hsql database files are in the HSqlDb subdirectory which was in your
list:

drwxr-xr-x   6 gschenk  192 Jan  5 11:28 HSqlDb

In that subdirectory, you should see files with names like lvg2008*

Offhand I'm not sure about the drugner engine is part of the default
Clinical Pipeline. I'll have to take a look at that.

On Thu, Jan 18, 2018 at 12:24 PM, Schenk, Gundolf 
wrote:

> Hi James,
>
>
>
> Many thanks for the link. It seems like the default Clinical Pipeline does
> not extract/annotate everything correctly. For example, in "father has hx
> of diabetes" the pipeline recognizes “diabetes” but does not set the
> subject attribute to “other” or “father”. Also, I have not seen any
> annotation from the drugner engine, although I understood that it is part
> of the default pipeline, is it not?
>
>
>
> On my installation I do not have the lvg files that you mention. In
> /Users/gschenk/NotesProcessing/apache-ctakes-4.0.0/resources/org/apache/ctakes/lvg/data
> I have this structure:
>
> drwxr-xr-x   6 gschenk  192 Jan  5 11:28 HSqlDb
>
> -rw-r--r--@  1 gschenk  319 Jan  5 11:22 ReadMe.txt
>
> drwxr-xr-x   8 gschenk  256 Jan  5 11:23 Unicode
>
> -rw-rw-r--@  1 gschenk  264 Jan  5 11:22 build.txt
>
> drwxr-xr-x   7 gschenk  224 Jan  5 11:23 config
>
> drwxr-xr-x   7 gschenk  224 Jan  5 11:23 misc
>
> drwxr-xr-x  12 gschenk  384 Jan  5 11:23 rules
>
> drwxr-xr-x  11 gschenk  352 Jan 16 09:54 tables
>
> -rwxr-xr-x@  1 gschenk   37 Apr  2  2017 version.txt
>
>
>
> and in tables:
>
> -rwxr-xr-x  1 gschenk2486063 Jan 16 09:54 acronym.data
>
> -rwxr-xr-x  1 gschenk   72583370 Jan 16 09:54 antiNorm.data
>
> -rwxr-xr-x  1 gschenk   31753678 Jan 16 09:54 canonical.data
>
> -rwxr-xr-x  1 gschenk 324456 Jan 16 09:54 derivation.data
>
> -rwxr-xr-x  1 gschenk  156803364 Jan 16 09:54 fruitful.data
>
> -rwxr-xr-x  1 gschenk   74664993 Jan 16 09:54 infl.data
>
> -rwxr-xr-x  1 gschenk 704862 Jan 16 09:54 nominalization.data
>
> -rwxr-xr-x  1 gschenk  83452 Jan 16 09:54 properNoun.data
>
> -rwxr-xr-x  1 gschenk 229622 Jan 16 09:54 synonyms.data
>
>
>
>
>
> Am I missing something (settings/resources/…)?
>
>
>
>
>
> Cheers,
>
> Gundolf.
>
>
>
>
>
> *From: *James Masanz 
> *Reply-To: *"user@ctakes.apache.org" 
> *Date: *Wednesday, January 17, 2018 at 5:43 AM
> *To: *"user@ctakes.apache.org" 
> *Subject: *Re: Errors when executing AggregatePlaintextFastUMLSProcessor
> [EXTERNAL]
>
>
>
> When I run the pipeline I also get 32 IdentifiedAnnotations
>
>
>
> On windows, I see these sizes for the lvg database files
>
>
>
> 922746880lvg2008.data
>
>104   lvg2008.properties
>
> 1312589   lvg2008.script
>
>
>
> This might help you get started with the meaning of the annotations:
>
>
>
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+-+Assertion
>
>
>
>
>
>
>
>
>
> On Tue, Jan 16, 2018 at 2:52 PM, Schenk, Gundolf 
> wrote:
>
> Dear Tim and James,
>
>
>
> Many thanks for your replies. I run cTAKES it as a downloaded “binary”
> from the mac osx Terminal app command line. I don’t use an IDE nor did I
> install it via maven nor from svn. I have downloaded the resources (again)
> and merged the lvg and dictionary folders into the designated places. The
> total size in the lvg subdirectory is 807.3 MB. Is this about the correct
> size, James? This did not change the errors output.
>
>
>
> I am not familiar with UIMA nor do I know much about nlp in general. I
> noticed that the errors only show when I load the AE xml file visa the CVD
> interface. Using the pipeline via cmdline no errors are shown. In both
> cases I get 32 identified annotations. Is this number as expected?
>
>
>
> So, I guess the information extraction works to the extent of these
> annotators capability, and to go further or improve the result I would have
> to tweak the algorithm somehow. Agreed?
>
>
>
> Is there documentation, that describes the meaning of the fields in the
> resulting xmi file? I would like to use it the xmi file as input for
> post-processing. Thanks!
>
>
>
> Cheers,
>
> Gundolf.
>
>
>
>
>
>
>
> *From: *James Masanz 
> *Reply-To: *"user@ctakes.apache.org" 
> *Date: *Saturday, January 13, 2018 at 12:09 PM
> *To: *"user@ctakes.apache.org" 
> *Subject: *Re: Errors when executing AggregatePlaintextFastUMLSProcessor
> [EXTERNAL

Re: Errors when executing AggregatePlaintextFastUMLSProcessor [EXTERNAL]

2018-01-17 Thread James Masanz
When I run the pipeline I also get 32 IdentifiedAnnotations

On windows, I see these sizes for the lvg database files

922746880lvg2008.data
   104   lvg2008.properties
1312589   lvg2008.script

This might help you get started with the meaning of the annotations:

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+-+Assertion




On Tue, Jan 16, 2018 at 2:52 PM, Schenk, Gundolf 
wrote:

> Dear Tim and James,
>
>
>
> Many thanks for your replies. I run cTAKES it as a downloaded “binary”
> from the mac osx Terminal app command line. I don’t use an IDE nor did I
> install it via maven nor from svn. I have downloaded the resources (again)
> and merged the lvg and dictionary folders into the designated places. The
> total size in the lvg subdirectory is 807.3 MB. Is this about the correct
> size, James? This did not change the errors output.
>
>
>
> I am not familiar with UIMA nor do I know much about nlp in general. I
> noticed that the errors only show when I load the AE xml file visa the CVD
> interface. Using the pipeline via cmdline no errors are shown. In both
> cases I get 32 identified annotations. Is this number as expected?
>
>
>
> So, I guess the information extraction works to the extent of these
> annotators capability, and to go further or improve the result I would have
> to tweak the algorithm somehow. Agreed?
>
>
>
> Is there documentation, that describes the meaning of the fields in the
> resulting xmi file? I would like to use it the xmi file as input for
> post-processing. Thanks!
>
>
>
> Cheers,
>
> Gundolf.
>
>
>
>
>
>
>
> *From: *James Masanz 
> *Reply-To: *"user@ctakes.apache.org" 
> *Date: *Saturday, January 13, 2018 at 12:09 PM
> *To: *"user@ctakes.apache.org" 
> *Subject: *Re: Errors when executing AggregatePlaintextFastUMLSProcessor
> [EXTERNAL]
>
>
>
>
>
> From this line in the error message
>
> "SET TABLE PUBLIC.LEXSYNONYM INDEX '202240 202240 0 0 5056'"
>
> it appears you might not have the full LVG database installed, which is an
> optional step of installing cTAKES.
>
>
>
> If I'm right about that and if you are running within an IDE, you can try
> to merge resources from https://sourceforge.net/projects/ctakesresources/
> files/ctakes-resources-4.0-bin.zip/download into the relevant subproject.
> In your case (lvg) this would mean merging the contents of
> resources\org\apache\ctakes\lvg\ from that zip into
> ctakes-lvg-res\src\main\resources\org\apache\ctakes\lvg
>
>
>
> If you are running from the binary downloads, try merging the contents of
> resources\org\apache\ctakes\lvg\ from that zip into the resources
> directory, which in your case looks like it is:
>
> /Users/gschenk/NotesProcessing/apache-ctakes-4.0.0/resources/org/apache/
> ctakes/lvg
>
>
>
> One way to see if you have the full LVG is to look at the sizes of the
> files in
>
> /Users/gschenk/NotesProcessing/apache-ctakes-4.0.0/resources/org/apache/
> ctakes/lvg
>
> If you have the full LVG database, the files in that directory will be
> more than a GB in total.
>
>
>
> As Tim mentioned, if LVG fails, the rest of the pipeline will run.  The
> LVG component of cTAKES was written to improve recall but it's not clear if
> it is necessary any more, or under which circumstances it is more or less
> likely to be worth including in a pipeline.
>
>
>
> some of the differences between the Default Clinical Pipeline and the
> running the pipeline defined by AggregatePlaintextFastUMLSProcessor.xml
> are:
>
>
>
>  - the Default Clinical Pipeline uses piper files to define which
> components are run. The main one is DefaultFastPipeline.piper, which then
> references others.
>
>  - The fixedFlow element of AggregatePlaintextFastUMLSProcessor lists
> which components, including the LvgAnnotator, are run when you use
> AggregatePlaintextFastUMLSProcessor.xml
>
>
>
> for getting started with cTAKES, especially if you are running the code
> from the trunk of SVN, I recommend using piper files. Piper files are not a
> generic UIMA concept, they are specific to cTAKES.
>
>
>
> For people planning to use UIMA-AS rather than just UIMA, I'd recommend
> learning to use the XML descriptors which you will want to understand when
> reading the UMA-AS documentation.
>
>
>
> -- James
>
>
>
> On Sat, Jan 13, 2018 at 8:05 AM, Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
>
> I am not 100% about this terminology but I think the
> AggregatePlaintextFastUMLSProcessor.xml is just the Uima Descriptor file
> that describes the default Clinica

Re: cTAKES FastLookUp AE With Custom Dictionary

2018-01-13 Thread James Masanz
were you able to solve this issue?

Looking at the Dictionary Creator GUI

page
you mentioned, I don't see a reference to a UmlsLookupAnnotator.xml file or
a "DictionaryDescriptor" attribute.

Have you tried to set  the fast dictionary parameter *LookupXml* to
org/apache/ctakes/dictionary/lookup/fast/DictionaryName.xml
(where you replace DictionaryName with whatever you entered in the Dictionary
Name field of the DIctionary Creator GUI)?


On Tue, Jan 2, 2018 at 4:00 AM, Prakhar Gaur 
wrote:

> Hello,
>
> I created a custom Dictionary using Dictionary creator GUI tool and the
> XML file for it is created successfully.
>
>
> Post that, according to documentation apache.org/confluence/display/CTAKES/Dictionary+Creator+GUI> (first
> Information box at bottom of page)
>
> I make required change in  UmlsLookupAnnotator.xml file for the
> "DictionaryDescriptor" attribute.
> But on running the piper file with Piper-runner tool, the default
> sno_rx_16ab file is used and not the newly created one.
>
> If I use the '-l' option followed with custom dictionary xml path with
> piper-runner tool, then the custom dictionary is used for UMLS lookup.
>
> Can someone please help me in getting the solution involving changes to
> UmlsLookupAnnotator.xml file to work ?
>
> Regards,
>


Re: Errors in Invoking Degree-of-Text-Relation Annotator From Java

2018-01-13 Thread James Masanz
I don't know if this test is up to date, but I see that the test:
org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotatorsTest
uses the line:
 builder.add(findDescription(DegreeOfRelationExtractorAnnotator.class));

You could try comparing what you are doing to what that test does.
If that test is not working, we should create a JIRA item so it is tracked
and fixed.


On Mon, Jan 8, 2018 at 1:21 AM, Prakhar Gaur 
wrote:

> Hello and happy new year wishes to Everyone,
>
> Config: cTakes 4.0 integrated with my Java project on Windows environment.
>
> I am want to create an 'AnalysisEngine' with DegreeOfTextRelation builder
> added programmatically in Java.
>
> For above I have tried two approaches both unsuccessfully:
>
> A)
> Below are the two lines of code I have added to AnalysisEngineDescription
> for the above mentioned purpose.
>
> * AnalysisEngineFactory.createPrimitiveDescription(
> DegreeOfRelationExtractorAnnotator.class)
>
> * AnalysisEngineFactory.createEngineDescription(
> DegreeOfRelationExtractorAnnotator.class)
>
> On building AnalysisEngineDescription with these lines it throws error.
> 'Error initializing "org.apache.ctakes.relationextractor.ae.
> DegreeOfRelationExtractorAnnotator" from descriptor .'
>
> B)
> Then I tried with Jar based approach by adding the below mentioned line
>
> * AnalysisEngineFactory.createEngineDescription("/org/
> apache/ctakes/org/apache/ctakes/relationextractor/
> models/degree_of/model.jar")
>
> This too failed with error,
> An import could not be resolved.  No file with name
> "/org/apache/ctakes/org/apache/ctakes/relationextractor/models/location_of/model/jar.xml"
> was found in the class path or data path. (Descriptor: )
>
>
> Can someone please help me how to resolve any of these two errors ?
> Comments, Pointers, Solution all is welcome.
>
> Regards,
>


Re: Errors when executing AggregatePlaintextFastUMLSProcessor [EXTERNAL]

2018-01-13 Thread James Masanz
>From this line in the error message
"SET TABLE PUBLIC.LEXSYNONYM INDEX '202240 202240 0 0 5056'"
it appears you might not have the full LVG database installed, which is an
optional step of installing cTAKES.

If I'm right about that and if you are running within an IDE, you can try
to merge resources from
https://sourceforge.net/projects/ctakesresources/files/ctakes-resources-4.0-bin.zip/download
into the relevant subproject. In your case (lvg) this would mean merging
the contents of resources\org\apache\ctakes\lvg\ from that zip into
ctakes-lvg-res\src\main\resources\org\apache\ctakes\lvg

If you are running from the binary downloads, try merging the contents of
resources\org\apache\ctakes\lvg\ from that zip into the resources
directory, which in your case looks like it is:
/Users/gschenk/NotesProcessing/apache-ctakes-4.0.0/resources/org/apache/ctakes/lvg

One way to see if you have the full LVG is to look at the sizes of the
files in
/Users/gschenk/NotesProcessing/apache-ctakes-4.0.0/resources/org/apache/ctakes/lvg
If you have the full LVG database, the files in that directory will be more
than a GB in total.

As Tim mentioned, if LVG fails, the rest of the pipeline will run.  The LVG
component of cTAKES was written to improve recall but it's not clear if it
is necessary any more, or under which circumstances it is more or less
likely to be worth including in a pipeline.

some of the differences between the Default Clinical Pipeline and the
running the pipeline defined by AggregatePlaintextFastUMLSProcessor.xml are:

 - the Default Clinical Pipeline uses piper files to define which
components are run. The main one is DefaultFastPipeline.piper, which then
references others.
 - The fixedFlow element of AggregatePlaintextFastUMLSProcessor lists which
components, including the LvgAnnotator, are run when you use
AggregatePlaintextFastUMLSProcessor.xml

for getting started with cTAKES, especially if you are running the code
from the trunk of SVN, I recommend using piper files. Piper files are not a
generic UIMA concept, they are specific to cTAKES.

For people planning to use UIMA-AS rather than just UIMA, I'd recommend
learning to use the XML descriptors which you will want to understand when
reading the UMA-AS documentation.

-- James

On Sat, Jan 13, 2018 at 8:05 AM, Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> I am not 100% about this terminology but I think the
> AggregatePlaintextFastUMLSProcessor.xml is just the Uima Descriptor file
> that describes the default Clinical Pipeline (in ctakes 4.0 this should be
> the default).
>
> As for the errors, it looks like it is in lvg, which probably can fail and
> still have the overall pipeline work ok. Honestly I don't know whether it
> would change the output of the dictionary module.
>
> Are you running from the binary download of cTAKES? Or did you download
> lvg with maven? If the latter you could try removing the ctakes-lvg
> directory in your maven repo (under ~/.m2/repository) and force maven to
> re-download. If the former I'm not sure what to recommend.
>
> Sorry for the delayed response.
>
> Tim
>
>
> 
> From: Schenk, Gundolf 
> Sent: Friday, January 12, 2018 5:18 PM
> To: user@ctakes.apache.org
> Subject: Re: Errors when executing AggregatePlaintextFastUMLSProcessor
> [EXTERNAL]
>
> anyone?
>
> On 1/8/18, 11:47 AM, "Schenk, Gundolf"  wrote:
>
> Hi,
>
> I am new to cTAKES (and NLP) and I am trying to extract clinically
> relevant information from electronic freetext records.
> I am running the default Clinical Pipeline. But I have a few general
> questions for understanding and for some error output I am seeing.
>
> 1) what is the difference between the Clinical Pipeline and the
> processor (AggregatePlaintextFastUMLSProcessor.xml)?
>
> 2) when running AggregatePlaintextFastUMLSProcessor.xml on the
> dr_nutritious_1.txt example via the CVD gui I only see 32
> IdentifiedAnnotations but I get a couple of error messages. Could anyone
> help me get started, please?
> Here are the error/warn messages:
> […]
> 08 Jan 2018 10:06:22  INFO POSTagger - POS tagger model file:
> org/apache/ctakes/postagger/models/mayo-pos.zip
> 08 Jan 2018 10:06:23  INFO LvgCmdApiResourceImpl - Loading NLM Norm
> and Lvg with config file = /Users/gschenk/NotesProcessing/apache-ctakes-
> 4.0.0/resources/org/apache/ctakes/lvg/data/config/lvg.properties
> 08 Jan 2018 10:06:23  INFO LvgCmdApiResourceImpl -   config file
> absolute path = /Users/gschenk/NotesProcessing/apache-ctakes-
> 4.0.0/resources/org/apache/ctakes/lvg/data/config/lvg.properties
> 08 Jan 2018 10:06:23  INFO LvgCmdApiResourceImpl - cwd =
> /Users/gschenk/NotesProcessing/apache-ctakes-4.0.0
> 08 Jan 2018 10:06:23  INFO LvgCmdApiResourceImpl - cd /Users/gschenk/
> NotesProcessing/apache-ctakes-4.0.0/resources/org/apache/ctakes/lvg/
> 08 Jan 2018 10:06:23  INFO ENGINE - open start - state not modified
>

Re: cTAKES 4.0.1-SNAPSHOT - Problem with CVD and AggregatePlaintextFastUMLSProcessor

2017-12-26 Thread James Masanz
I haven't tested to verify but it looks like this would also be fixed by
adding ctakes-drug-ner to the list of dependencies in the
ctakes-clincal-pipeline pom.xml file, since this resource problem was run
into while using intellij (as opposed to downloading and running from the
convenience binary) That fix is discussed in this thread:

 https://s.apache.org/pMfO


On Mon, Dec 11, 2017 at 3:00 PM, Andrew Conkie <
andrew.con...@redstarconsulting.co.uk> wrote:

> Hi Manuel,
>
> I've just had a similar message with another UIMA system and the issue was
> that although the path to the Type System file was correct, it didn't lie
> in the data path. If this is the same problem, it can be solved by:
>
> a) Changing the Type System import in the DrugMentionAnnotaotor file to
> use an absolute path,
>
> ie replace
> 
>
> with
> 
>
>
> b) More globally - because there may be other files affected - by passing
> the file path as a startup parameter
>
> ie, setting the following as a Java env variable
>
> -Duima.datapath=C:/Users/Manel/Desktop/Tese_Mestrado_
> NLP/cTAKES_Project_NLP_Dev/ctakes/ the dir where all the Type System
> files are
>
>
> Interestingly, I was previously trying the Drug Annotator code myself but
> couldn't get it to work and moved onto something else. I'm planning on
> going back to it so if you had any advice that would be appreciated.
>
> Cheers
> Andrew
>
>
>
>
> On Sat, Dec 9, 2017 at 11:03 PM, Manuel Lamy  wrote:
>
>> Hello guys,
>>
>> I'm aiming to extend cTAKES in order to fulfill the objectives of a
>> project I'm working on. So I decided to check out the code of trunk and
>> start checking it. I followed all tthe steps of the developer's guide found
>> here: https://cwiki.apache.org/confluence/display/CTAKES/cTA
>> KES+4.0+Developer+Install+Guide
>>
>> I'm using Intellij by the way. Well, when I try to run CVD in order to
>> test some changes I made in the code, I have a problem. I want to use
>> the AggregatePlaintextFastUMLSProcessor. When I try to load it in the
>> CVD UI, I have the following error:
>>
>> -
>>
>>   
>> *org.apache.uima.resource.ResourceInitializationException:
>> An import could not be resolved.  No file with name
>> "org/apache/ctakes/drugner/types/TypeSystem.xml" was found in the class
>> path or data path. (Descriptor:
>> file:/C:/Users/Manel/Desktop/Tese_Mestrado_NLP/cTAKES_Project_NLP_Dev/ctakes/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)*
>>
>> *-*
>>
>> The DrugMentionAnnotator.xml indeed has an import of the TypeSystem.xml
>> file. And this TypeSystem file physically exists in the path described in
>> the DrugMentionAnnotator.xml import. So, I don' really know why I have this
>> error in CVD, since the file that is originating the problem exists indeed
>> in the expectable path.
>>
>> I hope some of you guys can give me some help, because I don't know what
>> else to do in order to solve this issue.
>>
>> Thanks for your attention in advance.
>>
>> Best Regards,
>>
>> ML
>>
>
>


Re: Error - cTAKES 4.0.0 - Intellij

2017-12-26 Thread James Masanz
Thanks for letting me and everyone else know that the workaround is
working. I think we'll have to consider a better home (different pom file)
for those profiles so they work with all modules/components. But I'm glad
we have a workaround for now.

-- James

On Tue, Dec 26, 2017 at 1:55 PM, Manuel Lamy  wrote:

> Hello James,
>
> I just tested, it went OK. I could finally load the
> AggregatePlaintextFastUMLSProcessor.xml with success in CVD and run it.
>
> I just added the dependency to the pom file as I told you.
>
> Thanks for your help. It is important to have people like your around
> here, with such good knowledge of the system.
>
> I'll keep updating if I have any other problem.
>
> Best regards,
>
> Manuel
>
> 2017-12-26 18:49 GMT+00:00 James Masanz :
>
>>
>> You can try that, sure, and post back whether it works OK. I didn't want
>> to suggesting that until I checked if that would cause a cyclic dependency,
>> which I haven't checked yet.
>>
>> -- James
>>
>> On Tue, Dec 26, 2017 at 1:35 PM, Manuel Lamy  wrote:
>>
>>> Hello James,
>>>
>>> Thanks for your quick answer. I've been battling this problem in the
>>> last few days.
>>>
>>> So should I start by adding ctakes-drug-ner as a dependency of
>>> ctakes-clinical-pipeline module?
>>>
>>> I see where the error comes from yes. I hope to hear from you later
>>> concerning this problem.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>>
>>> Manuel Lamy
>>>
>>> 2017-12-26 18:28 GMT+00:00 James Masanz :
>>>
>>>> The runCVD profile is defined in the pom.xml file that is within
>>>> ctakes-clinical-pipeline, and that pom doesn't include ctakes-drug-ner in
>>>> its list of dependencies. I'll take a look this afternoon at a
>>>> recommendation.
>>>>
>>>> The AggregatePlaintextFastUMLSProcessor.xml file uses a relative path
>>>> when referencing DrugMentionAnnotator.xml but the class folder for
>>>> ctakes-drug-ner module is not in the classpath because the ctakes-drug-ner
>>>> is not in the list of dependencies
>>>>
>>>>
>>>>
>>>> On Fri, Dec 22, 2017 at 11:20 AM, Manuel Lamy 
>>>> wrote:
>>>>
>>>>> Hello guys,
>>>>>
>>>>> I'm using cTAKES version 4.0.0, developer version, using Intellij. I
>>>>> did all the setup of the project as per the Intellij section in the 
>>>>> *Developer
>>>>> Install Guide* present in the url: https://cwiki.apache.org/confl
>>>>> uence/display/CTAKES/cTAKES+4.0+Developer+Install+Guide
>>>>>
>>>>> Even so, I'm unfortunately having a strange error when I try to load
>>>>> the *AggregatePlaintextFastUMLSProcessor.xml *in CAS Visual Debbuger
>>>>> (CVD module).
>>>>>
>>>>> The error is the following:
>>>>>
>>>>>
>>>>> [image: Imagem inline 1]
>>>>> I also leave the top of the stack trace from the log:
>>>>>
>>>>>  *2017-12-22T15:59:06*
>>>>> *  1513958346021*
>>>>> *  2*
>>>>> *  org.apache.uima*
>>>>> *  SEVERE*
>>>>> *  org.apache.uima.tools.cvd.MainFrame*
>>>>> *  handleException(527)*
>>>>> *  22*
>>>>>
>>>>> *  Annotator class "org.apache.ctakes.drugner.ae
>>>>> <http://org.apache.ctakes.drugner.ae>.DrugMentionAnnotator" was not found.
>>>>> (Descriptor:
>>>>> file:/C:/Users/Manel/Desktop/Tese_Mestrado_NLP/cTAKES_Dev_Tese/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)*
>>>>>
>>>>> *  *
>>>>> *
>>>>> org.apache.uima.resource.ResourceInitializationException:
>>>>> Annotator class "org.apache.ctakes.drugner.ae
>>>>> <http://org.apache.ctakes.drugner.ae>.DrugMentionAnnotator" was not found.
>>>>> (Descriptor:
>>>>> file:/C:/Users/Manel/Desktop/Tese_Mestrado_NLP/cTAKES_Dev_Tese/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)*
>>>>> **
>>>>> *
>>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl*
>>>>> *  initializeAnalysisComponent*
>>>>> *  224*
>>>>> **
>>>>> **
>>>>> *
>>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl*
>>>>> *  initialize*
>>>>> *  172*
>>>>>
>>>>>
>>>>> I can't quite understand this error, since the class
>>>>> org.apache.ctakes.drugner.ae.DrugMentionAnnotator *physically exists *in
>>>>> the path referenced in the xml descriptor file. Did someone also had this
>>>>> problem?
>>>>>
>>>>> I'm blocked. I just can't advance in my project because all the
>>>>> solutions I came up with to this problem, like passing the full path,
>>>>> didn't work at all.
>>>>>
>>>>> I would be appreciated if someone could give me some help. Thanks in
>>>>> advance.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Manuel
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Error - cTAKES 4.0.0 - Intellij

2017-12-26 Thread James Masanz
You can try that, sure, and post back whether it works OK. I didn't want to
suggesting that until I checked if that would cause a cyclic dependency,
which I haven't checked yet.

-- James

On Tue, Dec 26, 2017 at 1:35 PM, Manuel Lamy  wrote:

> Hello James,
>
> Thanks for your quick answer. I've been battling this problem in the last
> few days.
>
> So should I start by adding ctakes-drug-ner as a dependency of
> ctakes-clinical-pipeline module?
>
> I see where the error comes from yes. I hope to hear from you later
> concerning this problem.
>
> Thank you!
>
> Best regards,
>
> Manuel Lamy
>
> 2017-12-26 18:28 GMT+00:00 James Masanz :
>
>> The runCVD profile is defined in the pom.xml file that is within
>> ctakes-clinical-pipeline, and that pom doesn't include ctakes-drug-ner in
>> its list of dependencies. I'll take a look this afternoon at a
>> recommendation.
>>
>> The AggregatePlaintextFastUMLSProcessor.xml file uses a relative path
>> when referencing DrugMentionAnnotator.xml but the class folder for
>> ctakes-drug-ner module is not in the classpath because the ctakes-drug-ner
>> is not in the list of dependencies
>>
>>
>>
>> On Fri, Dec 22, 2017 at 11:20 AM, Manuel Lamy  wrote:
>>
>>> Hello guys,
>>>
>>> I'm using cTAKES version 4.0.0, developer version, using Intellij. I did
>>> all the setup of the project as per the Intellij section in the *Developer
>>> Install Guide* present in the url: https://cwiki.apache.org/confl
>>> uence/display/CTAKES/cTAKES+4.0+Developer+Install+Guide
>>>
>>> Even so, I'm unfortunately having a strange error when I try to load the 
>>> *AggregatePlaintextFastUMLSProcessor.xml
>>> *in CAS Visual Debbuger (CVD module).
>>>
>>> The error is the following:
>>>
>>>
>>> [image: Imagem inline 1]
>>> I also leave the top of the stack trace from the log:
>>>
>>>  *2017-12-22T15:59:06*
>>> *  1513958346021*
>>> *  2*
>>> *  org.apache.uima*
>>> *  SEVERE*
>>> *  org.apache.uima.tools.cvd.MainFrame*
>>> *  handleException(527)*
>>> *  22*
>>>
>>> *  Annotator class "org.apache.ctakes.drugner.ae
>>> <http://org.apache.ctakes.drugner.ae>.DrugMentionAnnotator" was not found.
>>> (Descriptor:
>>> file:/C:/Users/Manel/Desktop/Tese_Mestrado_NLP/cTAKES_Dev_Tese/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)*
>>>
>>> *  *
>>> *org.apache.uima.resource.ResourceInitializationException:
>>> Annotator class "org.apache.ctakes.drugner.ae
>>> <http://org.apache.ctakes.drugner.ae>.DrugMentionAnnotator" was not found.
>>> (Descriptor:
>>> file:/C:/Users/Manel/Desktop/Tese_Mestrado_NLP/cTAKES_Dev_Tese/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)*
>>> **
>>> *
>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl*
>>> *  initializeAnalysisComponent*
>>> *  224*
>>> **
>>> **
>>> *
>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl*
>>> *  initialize*
>>> *  172*
>>>
>>>
>>> I can't quite understand this error, since the class
>>> org.apache.ctakes.drugner.ae.DrugMentionAnnotator *physically exists *in
>>> the path referenced in the xml descriptor file. Did someone also had this
>>> problem?
>>>
>>> I'm blocked. I just can't advance in my project because all the
>>> solutions I came up with to this problem, like passing the full path,
>>> didn't work at all.
>>>
>>> I would be appreciated if someone could give me some help. Thanks in
>>> advance.
>>>
>>> Best regards,
>>>
>>> Manuel
>>>
>>
>>
>


Re: Error - cTAKES 4.0.0 - Intellij

2017-12-26 Thread James Masanz
The runCVD profile is defined in the pom.xml file that is within
ctakes-clinical-pipeline, and that pom doesn't include ctakes-drug-ner in
its list of dependencies. I'll take a look this afternoon at a
recommendation.

The AggregatePlaintextFastUMLSProcessor.xml file uses a relative path when
referencing DrugMentionAnnotator.xml but the class folder for
ctakes-drug-ner module is not in the classpath because the ctakes-drug-ner
is not in the list of dependencies



On Fri, Dec 22, 2017 at 11:20 AM, Manuel Lamy  wrote:

> Hello guys,
>
> I'm using cTAKES version 4.0.0, developer version, using Intellij. I did
> all the setup of the project as per the Intellij section in the *Developer
> Install Guide* present in the url: https://cwiki.apache.org/
> confluence/display/CTAKES/cTAKES+4.0+Developer+Install+Guide
>
> Even so, I'm unfortunately having a strange error when I try to load the 
> *AggregatePlaintextFastUMLSProcessor.xml
> *in CAS Visual Debbuger (CVD module).
>
> The error is the following:
>
>
> [image: Imagem inline 1]
> I also leave the top of the stack trace from the log:
>
>  *2017-12-22T15:59:06*
> *  1513958346021*
> *  2*
> *  org.apache.uima*
> *  SEVERE*
> *  org.apache.uima.tools.cvd.MainFrame*
> *  handleException(527)*
> *  22*
>
> *  Annotator class "org.apache.ctakes.drugner.ae
> .DrugMentionAnnotator" was not found.
> (Descriptor:
> file:/C:/Users/Manel/Desktop/Tese_Mestrado_NLP/cTAKES_Dev_Tese/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)*
>
> *  *
> *org.apache.uima.resource.ResourceInitializationException:
> Annotator class "org.apache.ctakes.drugner.ae
> .DrugMentionAnnotator" was not found.
> (Descriptor:
> file:/C:/Users/Manel/Desktop/Tese_Mestrado_NLP/cTAKES_Dev_Tese/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)*
> **
> *
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl*
> *  initializeAnalysisComponent*
> *  224*
> **
> **
> *
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl*
> *  initialize*
> *  172*
>
>
> I can't quite understand this error, since the class
> org.apache.ctakes.drugner.ae.DrugMentionAnnotator *physically exists *in
> the path referenced in the xml descriptor file. Did someone also had this
> problem?
>
> I'm blocked. I just can't advance in my project because all the solutions
> I came up with to this problem, like passing the full path, didn't work at
> all.
>
> I would be appreciated if someone could give me some help. Thanks in
> advance.
>
> Best regards,
>
> Manuel
>


Re: Fuzzy String Matching against UMLS Data

2017-12-19 Thread James Masanz
Have you looked at the section titled "Text Overlap Match  " on the wiki
page for Fast Dictionary Lookup

?
Or are you looking to fuzzy match individual words?

On Fri, Dec 15, 2017 at 2:19 AM, Desteny Child  wrote:

> Hello,
>
> Does cTAKES contain the possibility to configure UMLS dictionary lookup to
> use fuzzy string matching instead of full string matching?
>
> Thank,
> Mike
>


Re: Slowness in processing files [EXTERNAL]

2017-12-16 Thread James Masanz
I created a 2MB file by concatenating together many copies of (the text
version of) Peds_FebrileSez_1 and it still isn't finished after many hours.
So that's going to require some debug.  Until someone gets to debugging
that:

As Jonas S pointed out, 11 seconds for 2K does mean ~ 3hrs for 2MB, if
linear, and I don't expect all components to be as nice as linear, though I
don't have numbers offhand.

A few ideas
  - are there parts of the files that can be ignored? 2MB seems large.
Using a sectionizer as the first part of the pipeline and having later
components skip processing some sections should help, if you don't need the
entire document annotated

 - are there some components you could do without?

 - you could try replacing some of the annotators with others that run
faster, for example there are rule based versions of  the polarity,
subject, certainty, and history of components (See NE Contexts)

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+NE+Contexts

Also, there is initialization time for each component - if you process 10
documents, it doesn't take 10 times as long as a single document.  So to
get a sense of how things will scale for you, you need to run multiple
documents. For example if I run 1 document = 11 second but 10 copies takes
75 seconds, not 110.

-- James






On Fri, Dec 15, 2017 at 3:16 PM, James Masanz 
wrote:

>
> I tried an input file of 5.5K and I was surprised to find it took 11
> seconds on my laptop.
>
> I'll run a 2MB input file and post results tomorrow. I'll also compare
> running from binary vs. running from within an IDE in case the timings are
> affected by the size of the jars built for the binary install.
>
> With the 5.5K input file, the annotators taking the most time were
>   ConstituencyParser - 39%
>   HistoryCleartk - 11%
>   PolarityCleartk - 11%
>   LVG annotator - 8%
>   GenericCleartk - 7.5%
>
> Note the above numbers are from a single run of a single file.
>
> If you're not using the output of any of the annotators that are among the
> longer-running ones in your environment (or  any downstream annotators that
> depend upon their output), you could consider removing some of them from
> your pipeline.
>
> For those not familiar with the CPE Gui, after it processes a set of
> documents, it outputs a performance report showing the percentage and
> absolute time taken by each annotator in a pipeline.
>
>
>
>
> On Thu, Dec 14, 2017 at 2:15 PM, Yadav, Harish 
> wrote:
>
>> Hi James,
>>
>>
>>
>> Below is the CAS consumer detail:
>>
>>
>>
>> FileWriterCasConsumer
>>
>>
>>
>> Descriptor in collection reader:
>>
>>
>>
>> FilesInDirectoryCollectionReader.xml
>>
>>
>>
>> The contents of AggregatePlaintextFastUMLSProcessor are not changed and
>> I have always used CPE GUI by clear all option. I am not sure of hard drive
>> error logs, but will check that as one of the possibilities.
>>
>>
>>
>> Could you please let me know approximately how much time it took for you
>> to run files of sizes ~2Mb (or if you can share any other benchmarks for
>> other file sizes you used earlier)
>>
>>
>>
>> Regards,
>>
>> Harish.
>>
>>
>>
>> *From:* James Masanz [mailto:masanz.ja...@gmail.com]
>> *Sent:* Thursday, December 14, 2017 1:21 PM
>> *To:* user@ctakes.apache.org
>> *Subject:* Re: Slowness in processing files [EXTERNAL]
>>
>>
>>
>> sorry, I meant verify that the contents of  the xml file for the fast
>> dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)
>>
>>
>>
>> On Thu, Dec 14, 2017 at 1:20 PM, James Masanz 
>> wrote:
>>
>>
>>
>> Harish,
>>
>>
>>
>> with the AggregatePlaintextFastUMLSProcessor, it should not be taking
>> that long. It sounds like either something outside of cTAKES is having an
>> issue (a hard drive starting to fail) or that you are accidentally
>> running AggregatePlaintextUMLSProcessor.
>>
>>
>> I've had issues with the CPE GUI not always behaving well for me.
>>
>>
>>
>> I suggest when you run the CPE GUI, you use File->Clear all and
>> re-enter/re-select what you want.
>>
>> If that doesn't help, verify that the contents
>> of AggregatePlaintextUMLSProcessor haven't been changed.
>>
>>
>> If none of that helps, as a last resort, I'd look into hard drive error
>> logs.
>>
>>
>>
>> Also, are you using a  Cas  Consumer? if so, which one.
>>
>>
>>
>

Re: Slowness in processing files [EXTERNAL]

2017-12-15 Thread James Masanz
I tried an input file of 5.5K and I was surprised to find it took 11
seconds on my laptop.

I'll run a 2MB input file and post results tomorrow. I'll also compare
running from binary vs. running from within an IDE in case the timings are
affected by the size of the jars built for the binary install.

With the 5.5K input file, the annotators taking the most time were
  ConstituencyParser - 39%
  HistoryCleartk - 11%
  PolarityCleartk - 11%
  LVG annotator - 8%
  GenericCleartk - 7.5%

Note the above numbers are from a single run of a single file.

If you're not using the output of any of the annotators that are among the
longer-running ones in your environment (or  any downstream annotators that
depend upon their output), you could consider removing some of them from
your pipeline.

For those not familiar with the CPE Gui, after it processes a set of
documents, it outputs a performance report showing the percentage and
absolute time taken by each annotator in a pipeline.




On Thu, Dec 14, 2017 at 2:15 PM, Yadav, Harish  wrote:

> Hi James,
>
>
>
> Below is the CAS consumer detail:
>
>
>
> FileWriterCasConsumer
>
>
>
> Descriptor in collection reader:
>
>
>
> FilesInDirectoryCollectionReader.xml
>
>
>
> The contents of AggregatePlaintextFastUMLSProcessor are not changed and I
> have always used CPE GUI by clear all option. I am not sure of hard drive
> error logs, but will check that as one of the possibilities.
>
>
>
> Could you please let me know approximately how much time it took for you
> to run files of sizes ~2Mb (or if you can share any other benchmarks for
> other file sizes you used earlier)
>
>
>
> Regards,
>
> Harish.
>
>
>
> *From:* James Masanz [mailto:masanz.ja...@gmail.com]
> *Sent:* Thursday, December 14, 2017 1:21 PM
> *To:* user@ctakes.apache.org
> *Subject:* Re: Slowness in processing files [EXTERNAL]
>
>
>
> sorry, I meant verify that the contents of  the xml file for the fast
> dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)
>
>
>
> On Thu, Dec 14, 2017 at 1:20 PM, James Masanz 
> wrote:
>
>
>
> Harish,
>
>
>
> with the AggregatePlaintextFastUMLSProcessor, it should not be taking
> that long. It sounds like either something outside of cTAKES is having an
> issue (a hard drive starting to fail) or that you are accidentally running
> AggregatePlaintextUMLSProcessor.
>
>
> I've had issues with the CPE GUI not always behaving well for me.
>
>
>
> I suggest when you run the CPE GUI, you use File->Clear all and
> re-enter/re-select what you want.
>
> If that doesn't help, verify that the contents of
> AggregatePlaintextUMLSProcessor haven't been changed.
>
>
> If none of that helps, as a last resort, I'd look into hard drive error
> logs.
>
>
>
> Also, are you using a  Cas  Consumer? if so, which one.
>
>
>
>
>
> On Thu, Dec 14, 2017 at 12:04 PM,  wrote:
>
> If a 2kb file takes about 11 seconds, then a 2mb file is expected to take
> ~11*1000 seconds which is about 3 hours (under the assumption that the
> runtime is linear to the file size).
>
> I do not know if the pipeline can be sped up. I would suggest to chunk the
> file into smaller chunks (pieces) and run the pipeline in parallel for each
> chunk.
>
> Jonas S
>
> Am 14.12.17 um 17:48 schrieb Yadav, Harish:
>
> Hi Timothy,
>
> Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor
> with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours)
> for a single file of 2 Mb size. It runs fine for 2 Kb file.
>
> Regards,
> Harish.
>
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> Sent: Thursday, December 14, 2017 11:22 AM
> To: user@ctakes.apache.org
> Subject: Re: Slowness in processing files [EXTERNAL]
>
> You missed the most important part of my message:
>
> Do not try to use AggregatePlainTextProcessor, it is just slow.
>
>
> Use AggregatePlaintextFastUMLSProcessor
>
> Tim
>
>
> On Thu, 2017-12-14 at 16:15 +, Yadav, Harish wrote:
>
> Hi Timothy,
>
> I fixed the password issues and ran with AE
> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
> have checked the memory consumption of the process and it never goes
> above 4.5 G, so I am not sure if it is the memory issue. However, AE
> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
> most of our files are in Mbs so processing time for each file for more
> than 2 hours is not feasible.
>
> Could you ple

Re: Slowness in processing files [EXTERNAL]

2017-12-14 Thread James Masanz
sorry, I meant verify that the contents of  the xml file for the fast
dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)

On Thu, Dec 14, 2017 at 1:20 PM, James Masanz 
wrote:

>
> Harish,
>
> with the AggregatePlaintextFastUMLSProcessor, it should not be taking
> that long. It sounds like either something outside of cTAKES is having an
> issue (a hard drive starting to fail) or that you are accidentally running
> AggregatePlaintextUMLSProcessor.
>
> I've had issues with the CPE GUI not always behaving well for me.
>
> I suggest when you run the CPE GUI, you use File->Clear all and
> re-enter/re-select what you want.
> If that doesn't help, verify that the contents of
> AggregatePlaintextUMLSProcessor haven't been changed.
>
> If none of that helps, as a last resort, I'd look into hard drive error
> logs.
>
> Also, are you using a  Cas  Consumer? if so, which one.
>
>
> On Thu, Dec 14, 2017 at 12:04 PM,  wrote:
>
>> If a 2kb file takes about 11 seconds, then a 2mb file is expected to take
>> ~11*1000 seconds which is about 3 hours (under the assumption that the
>> runtime is linear to the file size).
>>
>> I do not know if the pipeline can be sped up. I would suggest to chunk
>> the file into smaller chunks (pieces) and run the pipeline in parallel for
>> each chunk.
>>
>> Jonas S
>>
>> Am 14.12.17 um 17:48 schrieb Yadav, Harish:
>>
>>> Hi Timothy,
>>>
>>> Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor
>>> with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours)
>>> for a single file of 2 Mb size. It runs fine for 2 Kb file.
>>>
>>> Regards,
>>> Harish.
>>>
>>> -Original Message-
>>> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
>>> Sent: Thursday, December 14, 2017 11:22 AM
>>> To: user@ctakes.apache.org
>>> Subject: Re: Slowness in processing files [EXTERNAL]
>>>
>>> You missed the most important part of my message:
>>>
>>>> Do not try to use AggregatePlainTextProcessor, it is just slow.
>>>>
>>>
>>> Use AggregatePlaintextFastUMLSProcessor
>>>
>>> Tim
>>>
>>>
>>> On Thu, 2017-12-14 at 16:15 +, Yadav, Harish wrote:
>>>
>>>> Hi Timothy,
>>>>
>>>> I fixed the password issues and ran with AE
>>>> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
>>>> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
>>>> have checked the memory consumption of the process and it never goes
>>>> above 4.5 G, so I am not sure if it is the memory issue. However, AE
>>>> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
>>>> most of our files are in Mbs so processing time for each file for more
>>>> than 2 hours is not feasible.
>>>>
>>>> Could you please suggest something which may improve the performance.
>>>> Below are the logs for the process of 2 Mb file with
>>>> AggregatePlainTextProcessor:
>>>>
>>>>
>>>>
>>>> Logs:
>>>>
>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
>>>> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
>>>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
>>>> org.apache.uima.tools.cpm.CpmFrame
>>>> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences 
>>>> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs
>>>> at root 0x8002. Windows
>>>> RegCreateKeyEx(...) returned error code 5.
>>>> log4j: reset attribute= "false".
>>>> log4j: Threshold ="null".
>>>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>>>> log4j: Setting [ProgressAppender] additivity to [false].
>>>> log4j: Level value for ProgressAppender is  [INFO].
>>>> log4j: ProgressAppender level set to INFO
>>>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>>>> log4j: Setting property [conversionPa

Re: Slowness in processing files [EXTERNAL]

2017-12-14 Thread James Masanz
tionProcessingEngine(UIM
>>>> AF
>>>> ramework.java:918) at
>>>> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
>>>> 3)
>>>>  at
>>>> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>>>>  at
>>>> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
>>>> by: org.apache.uima.resource.ResourceConfigurationException:
>>>> Initialization of CAS Processor with name
>>>> "AggregatePlaintextFastUMLSProcessor" failed. at
>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>>>> eg
>>>> ratedCasProcessor(CPEFactory.java:1101) at
>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
>>>> es
>>>> sors(CPEFactory.java:547) at
>>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
>>>> va
>>>> :253) at
>>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.(BaseCPMImpl.
>>>> ja
>>>> va:127) at
>>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>>>> ti
>>>> alize(CollectionProcessingEngine_impl.java:73) ... 5 more
>>>> Caused by:
>>>> org.apache.uima.resource.ResourceInitializationException:
>>>> Initialization of annotator class
>>>> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator"
>>>> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
>>>> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
>>>> fast/desc/analysis_engine/UmlsLookupAnnotator.xml) at
>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>> ni
>>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>>>>  at
>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>> ni
>>>> tialize(PrimitiveAnalysisEngine_impl.java:170) at
>>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>>>> ly
>>>> sisEngineFactory_impl.java:94) at
>>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>>>> Co
>>>> mpositeResourceFactory_impl.java:62) at
>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>>>> 9)
>>>>  at
>>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>>>> av
>>>> a:407) at
>>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
>>>> va
>>>> :256) at
>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>> ni
>>>> tASB(AggregateAnalysisEngine_impl.java:429) at
>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>> ni
>>>> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
>>>> 3)
>>>>  at
>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>> ni
>>>> tialize(AggregateAnalysisEngine_impl.java:186) at
>>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>>>> ly
>>>> sisEngineFactory_impl.java:94) at
>>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>>>> Co
>>>> mpositeResourceFactory_impl.java:62) at
>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>>>> 9)
>>>>  at
>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
>>>> 1)
>>>>  at
>>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>>>> av
>>>> a:448) at
>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>>>> eg
>>>> ratedCasProcessor(CPEFactory.java:1085) ... 9 more Caused
>>>> by:
>>>> org.apache.uima.resource.ResourceInitializationException: MESSAGE
>>>> LOCALIZATION FAILED: Can't find resource for bundle
>>>> java.util.PropertyResourceBundle, key C ould not construct
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>> ti
>>>> onary at
>>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>>>> ni
>>>> tialize(AbstractJCasTermAnnotator.java:131) at
>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>> ni
>>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>>>>  ... 24 more Caused by:
>>>> org.apache.uima.analysis_engine.annotator.AnnotatorContextException
>>>> :
>>>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
>>>> java.util.PropertyResourceBu ndle, key Could not construct
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>> ti
>>>> onary at
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>> rP
>>>> arser.parseDictionary(DictionaryDescriptorParser.java:199)
>>>> at
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>> rP
>>>> arser.parseDictionaries(DictionaryDescriptorParser.java:156)
>>>> at
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>> rP
>>>> arser.parseDescriptor(DictionaryDescriptorParser.java:128)
>>>> at
>>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>>>> ni
>>>> tialize(AbstractJCasTermAnnotator.java:129) ... 25 more
>>>> Caused
>>>> by: java.lang.reflect.InvocationTargetException at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>> Method)
>>>>  at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
>>>> Source)
>>>>  at
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
>>>> Source) at
>>>> java.lang.reflect.Constructor.newInstance(Unknown
>>>> Source) at
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>> rP
>>>> arser.parseDictionary(DictionaryDescriptorParser.java:196)
>>>> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
>>>> dictionary sno_rx_16abTerms at
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>> ti
>>>> onary.(UmlsJdbcRareWordDictionary.java:29) ... 33 more
>>>>   From: James Masanz [mailto:masanz.ja...@gmail.com]
>>>> Sent: Wednesday, December 13, 2017 8:56 PM
>>>> To: user@ctakes.apache.org
>>>> Subject: Re: Slowness in processing files
>>>>   Using AggregatePlaintextFastUMLSProcessor  is much faster than
>>>> AggregatePlainTextProcessor, so I suggest that to start with you
>>>> just use AggregatePlaintextFastUMLSProcessor.
>>>>   Do you mean it is taking ~5 hours for a single file to be processed
>>>> at times, or is that for a set of files?
>>>>   If your JVM heap space is not set large enough, you can get very
>>>> slow results.
>>>> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For
>>>> faster start up, you can also set the -Xms to the same or something
>>>> close to -Xmx value.
>>>> -- James
>>>>   On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish >>>
>>>>>
>>>>> wrote:
>>>> Hi All,
>>>>   When the medical records are run with the AE as
>>>> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
>>>> the processing is very slow. It is pretty fast when the smaller
>>>> files
>>>> (~2 kb) are fed as input but when I am processing with bigger files
>>>> say, 2Mb, it is very slow and the files are taking ~5 hours to
>>>> process. Any pointer will be of great help.
>>>>   Regards,
>>>> Harish.
>>>>
>>>>
>>>
>>


Re: Slowness in processing files

2017-12-13 Thread James Masanz
Using AggregatePlaintextFastUMLSProcessor  is much faster than
AggregatePlainTextProcessor, so I suggest that to start with you just use
AggregatePlaintextFastUMLSProcessor.

Do you mean it is taking ~5 hours for a single file to be processed at
times, or is that for a set of files?

If your JVM heap space is not set large enough, you can get very slow
results.
Try increasing to 5G (or more) using the JVM parameter   -Xmx5G
For faster start up, you can also set the -Xms to the same or something
close to -Xmx value.

 -- James

On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish  wrote:

> Hi All,
>
>
>
> When the medical records are run with the AE as
> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor the
> processing is very slow. It is pretty fast when the smaller files (~2 kb)
> are fed as input but when I am processing with bigger files say, 2Mb, it is
> very slow and the files are taking ~5 hours to process. Any pointer will be
> of great help.
>
>
>
> Regards,
>
> Harish.
>


Re: AE for Non UMLS Dictionary

2017-12-09 Thread James Masanz
yes, only credentials go out over the internet, not any of the text being
processed.


On Fri, Dec 8, 2017 at 7:34 AM, Smith, Lincoln 
wrote:

> This is also a problem for organizations that want to use the UMLS
> dictionary but have compliance and data security concerns. My understanding
> is that only credentials go out and no packets of other sensitive
> information are exiting (right?) through an API but this is not always
> clear from documentation.
>
>
>
> *From:* Prakhar Gaur [mailto:prakhar.g...@infosys.com]
> *Sent:* Friday, December 08, 2017 8:02 AM
> *To:* user (user@ctakes.apache.org)
> *Subject:* [EXTERNAL] AE for Non UMLS Dictionary
>
>
>
> Hello,
>
>
>
> Are there any Analysis Engine available for Non UMLS dictionary ?
>
>
>
> Use case is for our application, if Internet connection is not available
> the pipeline (Specifically DefaultJcasTermAnnotator AE) will fail as UMLS
> user ID authentication will not happen.
>
> So as an alternative I want to have another AE that does not depend on
> UMLS authentication, as a failsafe to the one that relies on connectivity.
>
>
>
> I understand the annotation would be different and “lower quality” (less
> useful) from the UMLS dictionary dependent one.
>
>
>
> Regards,
>
> --
>
> The information contained in this transmission may contain privileged and
> confidential information including personal information protected by
> federal and/or state privacy laws. It is intended only for the use of the
> addressee named above. If you are not the intended recipient, you are
> hereby notified that any review, dissemination, distribution or duplication
> of this communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message. Highmark Health is a Pennsylvania nonprofit
> corporation. This communication may come from Highmark Health or one of its
> subsidiaries or affiliated businesses.
>


Re: AE for Non UMLS Dictionary

2017-12-09 Thread James Masanz
if you download UMLS metathesaurs yourself and use the Dictionary Creator
Gui to create your own dictionary from UMLS, then cTAKES does not need to
connect to the UTS server to check a UMLS user ID/password.  The live check
that cTAKES does over the internet at runtime is only used when someone
uses a prebuilt dictionary where cTAKES can't tell if you have UMLS
credentials without contacting the UTS server.  But if you've downloaded
UMLS yourself and built your own dictionary, then cTAKES doesn't do that
runtime check because in order to download the metathesaurus from UTS, you
needed to log into the UTS site.

-- James




On Fri, Dec 8, 2017 at 7:01 AM, Prakhar Gaur 
wrote:

> Hello,
>
> Are there any Analysis Engine available for Non UMLS dictionary ?
>
> Use case is for our application, if Internet connection is not available
> the pipeline (Specifically DefaultJcasTermAnnotator AE) will fail as UMLS
> user ID authentication will not happen.
> So as an alternative I want to have another AE that does not depend on
> UMLS authentication, as a failsafe to the one that relies on connectivity.
>
> I understand the annotation would be different and "lower quality" (less
> useful) from the UMLS dictionary dependent one.
>
> Regards,
>


Re: cTakes Side Effect extractor

2017-10-30 Thread James Masanz
Below are details of problems I ran across when I tried to run the side
effect annotator.
It might be too detailed to be of help to you, but perhaps not, or perhaps
someone else will chime in who knows more

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Side+Effect

Trying to run this aggregate xml descriptor for side effects:

desc\ctakes-side-effect\desc\analysis_engine\SideEffectAggregateTAE_UMLS.xml

I tried loading that descriptor using CVD.

-  at load time, DrugCNP2LookupWindow.xml could not be found. I don't see
it within cTAKES 4.0. It was part of 3.1.
There is a copy at
https://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.1.1/ct
akes-drug-ner/desc/analysis_engine/

- java.lang.RuntimeException: Attempting to load a FileResource from a jar.
The File to be loaded cannot be within a jar.jar:file:/C:/cTAKES/4.0/ap
ache-ctakes-4.0.0/lib/ctakes-side-effect-res-4.0.0.jar!/
org/apache/ctakes/sideeffect/lookup/sideEffect_dictionary.txt

Sean F. made a change to trunk (revision 1803217) that will likely fix that
error. In the meantime, if you are using the cTAKES 4.0 convenience binary,
it looks like you can bypass that error by copying that sideeffect subtree
to under  resources/org/apache/ctakes/sideeffect/

The next thing seems to be that it expects the document to have sections
headings and endings in a specific format.
There are examples in
https://apache.googlesource.com/ctakes/+/branches/ctakes-3.1.0/ctakes-assertion-zoner-res/src/main/resources/org/mitre/medfacts/uima/mayo_sections.xml


Here is a snippet of a made-up document that would get recognized.

[start section='20104']
Patient reports dizziness after taking celexa and Chlorzoxazone

After starting Celexa, patient complains of excess sweating

[end section='20104']

Then I'm getting an SQL error when cTAKES is trying to reference a column
called FIRST_WORD.

That column is in the database for the original dictionary lookup but not
the newer fast dictionary lookup.

Unfortunately it doesn't look like there is a simple fix  or workaround -
looks like it will require some digging.

-- James


On Sun, Oct 29, 2017 at 11:14 PM, Agarwal, Mahesh Kumar <
mkagar...@mgh.harvard.edu> wrote:

> Hi James,
>
> Thank you for your response. I am using version 4.0 and I am running it
> through CVD. For example, AggregatePlainFastUMLSProcessor works fine. I
> did go through the user installation guide and I (think I) understand the
> component dependencies, however, I am unsure how to put the various
> components together for side effect detection. Any help will be much
> appreciated.
>
> Best,
> Mahesh
>
>
> On Oct 28, 2017, at 9:07 PM, James Masanz  wrote:
>
>
> Which version of cTAKES did you download?
> Have you looked at the the User Install Guide:
> https://cwiki.apache.org/confluence/display/CTAKES/
> cTAKES+4.0+User+Install+Guide
> If so, are you running cTAKES through a CVD, a piper file, a CPE, or are
> you launching it in some other way?
>
>
> On Wed, Oct 25, 2017 at 9:23 PM, Agarwal, Mahesh Kumar <
> mkagar...@mgh.harvard.edu> wrote:
>
>> Hi,
>>
>> I am new to cTAKES. I have clinical documents and I want to understand
>> the side effect of a certain class of drugs. I tried to use the
>> SideEffectSentenceAggregate but I keep getting resource initialization
>> exceptions. I  downloaded the code for the example temporal module provided
>> at http://54.68.117.30:8080/index.jsp and it works just fine.
>>
>> Can anyone please advice what I can do to get this to work?
>>
>> Best,
>> Mahesh
>>
>> The information in this e-mail is intended only for the person to whom it
>> is
>> addressed. If you believe this e-mail was sent to you in error and the
>> e-mail
>> contains patient information, please contact the Partners Compliance
>> HelpLine at
>> http://www.partners.org/complianceline . If the e-mail was sent to you
>> in error
>> but does not contain patient information, please contact the sender and
>> properly
>> dispose of the e-mail.
>>
>
>
>


Re: cTakes Side Effect extractor

2017-10-28 Thread James Masanz
Which version of cTAKES did you download?
Have you looked at the the User Install Guide:
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+User+Install+Guide
If so, are you running cTAKES through a CVD, a piper file, a CPE, or are
you launching it in some other way?


On Wed, Oct 25, 2017 at 9:23 PM, Agarwal, Mahesh Kumar <
mkagar...@mgh.harvard.edu> wrote:

> Hi,
>
> I am new to cTAKES. I have clinical documents and I want to understand the
> side effect of a certain class of drugs. I tried to use the
> SideEffectSentenceAggregate but I keep getting resource initialization
> exceptions. I  downloaded the code for the example temporal module provided
> at http://54.68.117.30:8080/index.jsp and it works just fine.
>
> Can anyone please advice what I can do to get this to work?
>
> Best,
> Mahesh
>
> The information in this e-mail is intended only for the person to whom it
> is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in
> error
> but does not contain patient information, please contact the sender and
> properly
> dispose of the e-mail.
>


Re: cTAKES-generated annotations

2017-10-26 Thread James Masanz
The (separately downoadable) dictionaries for anatomical sites, procedures,
signs/symptoms, and disorders/diseases dictionaries are built from
SNOMED-CT, ICD-9 and others, and cTAKES can assign UMLS Metathesaurus CUIs
and SNOMED-CT codes for from those dictionaries.

Medications can be annotated with RXNorm codes and CUIs.

Those annotations are all described in A common type system for clinical
natural language processing
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575354/>

You can build your own dictionaries from other vocabularies within the UMLS
Metathesaurus using the Dictionary Creator GUI
<https://cwiki.apache.org/confluence/display/CTAKES/Dictionary+Creator+GUI>.

Semantic role labelling follows the conventions of PropBank
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575354/#B15>

Temporal annotations are based on TimeML:
http://www.aclweb.org/anthology/S13-2002
http://clear.colorado.edu/compsem/documents/THYME_guidelines.pdf


hope that helps
-- James Masanz


On Tue, Oct 24, 2017 at 1:14 PM, Marcello Bax UFMG 
wrote:

> Hi,
>  Could anyone clarify me if cTAKES-generated annotations follow some
> standard format or ontology, like Web Annotation Ontology (W3C), BioC, and
> others?
>   It would facilitate interoperability issues.
>
> Thanks,
> Marcello Bax
> Information Theory and Management Department - Information Science School
> - UFMG
> bax.eci.ufmg.br
>


Re: Setting the Lvg Resources Location in lvg.properties

2017-09-29 Thread James Masanz
No.  I think that when the database was converted to be compatible with the
more recent hsqlb, the conversion program lost the readonly attribute.
I'll fix that. Thanks for letting everyone know it's working now for you
and what needs to be done.


On Sep 29, 2017 5:27 PM, "Michael Trepanier"  wrote:

> Taking the resources out of my fat jar resolved this issue. I should add
> that as I'm running this pipeline in Spark I had to set the related HSQLDBs
> to read-only to permit simultaneous reads. Is there any reason they are not
> set to read-only to begin with?
>
> Mike
>
> On Thu, Sep 28, 2017 at 6:27 PM, James Masanz 
> wrote:
>
>>
>> I would expect that if you copy the LVG resources out of your UberJar, it
>> should resolve the issue.
>> Modifying the lvg.properties file generally causes problems.
>> Following the hints in the 30/Jul/17 23:06 update to CTAKES-445
>> <https://issues.apache.org/jira/browse/CTAKES-445>  should work without
>> your having to modify the lvg.properties file.
>>
>> I haven't tested the patch to CTAKES-445
>> <https://issues.apache.org/jira/browse/CTAKES-445> myself yet so I don't
>> know whether it takes care of the problem in this case. I do know that the
>> ctakes-lvg code does a change directory (cd)  to where it expects the LVG
>> resources to be, or at least that's what it used to do when I last looked
>> at it.  I suspect trying to cd into a jar is the problem you are seeing.
>> I'll have to revisit that when I look at that patch.
>>
>> -- James
>>
>>
>>
>> On Tue, Sep 26, 2017 at 5:53 PM, Michael Trepanier 
>> wrote:
>>
>>> I am attempting to run cTAKES from an executable UberJar. While the fast
>>> pipeline seems to run correctly (in terms of producing an output), when
>>> stepping through the LvgAnnotator related steps, cTAKES produces the below
>>> error.
>>>
>>> 26 Sep 2017 22:47:01  INFO LvgAnnotator - URL for lvg.properties 
>>> =file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
>>> 26 Sep 2017 22:47:01  INFO SentenceDetector - Sentence detector model file: 
>>> org/apache/ctakes/core/sentdetect/sd-med-model.zip
>>> 26 Sep 2017 22:47:01  INFO TokenizerAnnotatorPTB - Initializing 
>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>>> 26 Sep 2017 22:47:01  INFO LvgCmdApiResourceImpl - Loading NLM Norm and Lvg 
>>> with config file = 
>>> jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
>>> 26 Sep 2017 22:47:01  INFO LvgCmdApiResourceImpl -   config file absolute 
>>> path = 
>>> /home/mike/jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
>>> 26 Sep 2017 22:47:01  INFO LvgCmdApiResourceImpl - cwd = /home/mike
>>> 26 Sep 2017 22:47:01  INFO LvgCmdApiResourceImpl - cd 
>>> jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/
>>> ** Configuration Error: 
>>> jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
>>>  (No such file or directory)
>>> ** Error: problem of opening/reading config file: 
>>> 'jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties'.
>>>  Use -x option to specify the config file path.
>>> ** Configuration Error: 
>>> jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
>>>  (No such file or directory)
>>> ** Error: problem of opening/reading config file: 
>>> 'jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties'.
>>>  Use -x option to specify the config file path.
>>>
>>> Would taking the additional cTAKES resources out of the UberJar resolve
>>> this issue? And if so, can I use the lvg.properties file to set where these
>>> resources should be?
>>>
>>> Note, as mentioned before, this error does not cause cTAKES to crash; I
>>> am just worried it may be impacting the output. As well, I have implemented
>>> the patch outlined at https://issues.apache.org/jira/browse/CTAKES-445
>>>
>>>
>>> Regards,
>>>
>>> Mike
>>>
>>> --
>>> [image: MetiStream Logo - 500]
>>> Mike Trepanier| Big Data Engineer | MetiStream, Inc. |
>>> m...@metistream.com | 845 - 270 - 3129 <(845)%20270-3129> (m) |
>>> www.metistream.com
>>>
>>
>>
>
>
> --
> [image: MetiStream Logo - 500]
> Mike Trepanier| Big Data Engineer | MetiStream, Inc. |
> m...@metistream.com | 845 - 270 - 3129 <(845)%20270-3129> (m) |
> www.metistream.com
>


Re: adding PrettyTextWriter output to piper file

2017-09-28 Thread James Masanz
If you haven't used the html writer before,  I suggest you try it

// Html writer
add pretty.html.HtmlTextWriter

If that doesn't meet your needs, and you truly want PrettyTextWriter,  I
think it would be added to a piper file like this:

// Add what is used by ctakes-core/desc/cas_consumer/PrettyTextWriter.xml
add pretty.plaintext.PrettyTextWriterUima

-- James

On Sun, Sep 24, 2017 at 4:28 PM, Francisco Pereira <
francisco.pere...@gmail.com> wrote:

> Hi,
>
> I'm a novice user, and have been working with a slightly modified version
> of the default clinical pipeline, as defined in DefaultFastPipeline piper
> file.
>
> In that file, I specify FileTreeXmiWriter to generate output. I was
> wondering how one might, instead, generate output like what we would get
> using
>
> desc/ctakes-core/desc/cas_consumer/PrettyTextWriter.xml
>
> within a piper file. Is there a way of doing this, or does one have to use
> XML files?
>
> thank you for your help!
> Francisco
>
>
>


Re: Setting the Lvg Resources Location in lvg.properties

2017-09-28 Thread James Masanz
I would expect that if you copy the LVG resources out of your UberJar, it
should resolve the issue.
Modifying the lvg.properties file generally causes problems.
Following the hints in the 30/Jul/17 23:06 update to CTAKES-445
  should work without
your having to modify the lvg.properties file.

I haven't tested the patch to CTAKES-445
 myself yet so I don't
know whether it takes care of the problem in this case. I do know that the
ctakes-lvg code does a change directory (cd)  to where it expects the LVG
resources to be, or at least that's what it used to do when I last looked
at it.  I suspect trying to cd into a jar is the problem you are seeing.
I'll have to revisit that when I look at that patch.

-- James



On Tue, Sep 26, 2017 at 5:53 PM, Michael Trepanier 
wrote:

> I am attempting to run cTAKES from an executable UberJar. While the fast
> pipeline seems to run correctly (in terms of producing an output), when
> stepping through the LvgAnnotator related steps, cTAKES produces the below
> error.
>
> 26 Sep 2017 22:47:01  INFO LvgAnnotator - URL for lvg.properties 
> =file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
> 26 Sep 2017 22:47:01  INFO SentenceDetector - Sentence detector model file: 
> org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 26 Sep 2017 22:47:01  INFO TokenizerAnnotatorPTB - Initializing 
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 26 Sep 2017 22:47:01  INFO LvgCmdApiResourceImpl - Loading NLM Norm and Lvg 
> with config file = 
> jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
> 26 Sep 2017 22:47:01  INFO LvgCmdApiResourceImpl -   config file absolute 
> path = 
> /home/mike/jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
> 26 Sep 2017 22:47:01  INFO LvgCmdApiResourceImpl - cwd = /home/mike
> 26 Sep 2017 22:47:01  INFO LvgCmdApiResourceImpl - cd 
> jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/
> ** Configuration Error: 
> jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
>  (No such file or directory)
> ** Error: problem of opening/reading config file: 
> 'jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties'.
>  Use -x option to specify the config file path.
> ** Configuration Error: 
> jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
>  (No such file or directory)
> ** Error: problem of opening/reading config file: 
> 'jar:file:/home/mike/ctakes-assembly-4.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties'.
>  Use -x option to specify the config file path.
>
> Would taking the additional cTAKES resources out of the UberJar resolve
> this issue? And if so, can I use the lvg.properties file to set where these
> resources should be?
>
> Note, as mentioned before, this error does not cause cTAKES to crash; I am
> just worried it may be impacting the output. As well, I have implemented
> the patch outlined at https://issues.apache.org/jira/browse/CTAKES-445
>
>
> Regards,
>
> Mike
>
> --
> [image: MetiStream Logo - 500]
> Mike Trepanier| Big Data Engineer | MetiStream, Inc. |
> m...@metistream.com | 845 - 270 - 3129 <(845)%20270-3129> (m) |
> www.metistream.com
>


Re: UMLS Account at https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser is not valid

2017-09-04 Thread James Masanz
I ran a test on Windows using cTAKES 4.0 (from the binary download) and my
UMLS user ID validates OK:

04 Sep 2017 15:15:32  INFO UmlsUserApprover - Checking UMLS Account at
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user jamesmasanz:
..04 Sep 2017 15:15:33  INFO UmlsUserApprover -   UMLS Account at
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user jamesmasanz has
been validated

When I get a UMLS user validation error, things I check are
 - is the site/service up?
 - can I login to the interactive site with my credentials? Look for "Sign
In" on the upper right of  https://uts.nlm.nih.gov/
 - are there special characters in my password (or user ID) that could
cause a problem for the shell/script

Also, there's no need to specify  ctakes.umlsvendor  or  ctakes.umlsaddr  in
the script, so it's better not to, for clarity.
localhost  isn't what you'd want there, but it looks like, from the log you
posted, that -Dctakes.umlsaddr is not having an effect  anyway.



On Mon, Sep 4, 2017 at 9:37 AM, Alexandru Zbarcea  wrote:

> Hi,
>
> I'm trying to run cTAKES following the documentation [1], and even after
> getting an UMLS Account, the runctakesCVD loads with exception:
> "org.apache.uima.resource.ResourceInitializationException: Initialization
> of annotator class 
> "org.apache.ctakes.dictionary.lookup2.ae.DefaultCasTermAnnotator"
> failed", but the only error in the log is:
>
> ./bin/runctakesCVD.sh -desc desc/ctakes-clinical-pipeline/
> desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml
> (...)
> 03 Sep 2017 22:00:37  INFO Chunker - Chunker model file:
> org/apache/ctakes/chunker/models/chunker-model.zip
> 03 Sep 2017 22:00:38  INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 03 Sep 2017 22:00:38  INFO ContextDependentTokenizerAnnotator - Finite
> state machines loaded.
> 03 Sep 2017 22:00:38  INFO AbstractJCasTermAnnotator - Using dictionary
> lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 03 Sep 2017 22:00:38  INFO AbstractJCasTermAnnotator - Exclusion tagset
> loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
> VBP VBZ WDT WP WPS WRB
> 03 Sep 2017 22:00:38  INFO AbstractJCasTermAnnotator - Using minimum term
> text span: 3
> 03 Sep 2017 22:00:38  INFO AbstractJCasTermAnnotator - Using Dictionary
> Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 03 Sep 2017 22:00:38  INFO DictionaryDescriptorParser - Parsing dictionary
> specifications:
> 03 Sep 2017 22:00:38  INFO UmlsUserApprover - Checking UMLS Account at
> https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user :
> .03 Sep 2017 22:00:39 ERROR UmlsUserApprover -   UMLS Account at
> https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser is not valid for user
>  with 
>
> Tunneling the connection I can see:
>
> POST /restful/isValidUMLSUser HTTP/1.1
> User-Agent: Java/1.8.0_144
> Host: uts-ws.nlm.nih.gov:80
> Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
> Connection: keep-alive
> Content-type: application/x-www-form-urlencoded
> Content-Length: 76
>
> licenseCode=&user=&password=
>
> HTTP/1.0 301 Moved Permanently
> Location: https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser
> Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
> Server: BigIP
> Connection: Keep-Alive
> Content-Length: 0
>
> Where instead of ,  and  are used the
> valid entries from the https://uts.nlm.nih.gov/ account.
>
> and the runtakesCVD.sh is changed as:
>
> java -Dctakes.umlsuser= \
>  -Dctakes.umlspw="" \
>  -Dctakes.umlsvendor="" \
>  -Dctakes.umlsaddr="http://localhost:8080/restful/isValidUMLSUser"; \
>  -cp $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/lib/* \
>  -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml \
>  -Dswing.defaultlaf=com.sun.java.swing.plaf.gtk.GTKLookAndFeel \
>  -Dawt.useSystemAAFontSettings=on \
>  -Dsun.java2d.dpiaware=true \
>  -Xms512M \
>  -Xmx3g \
>  org.apache.uima.tools.cvd.CVD "$@"
>
> Going to: https://uts.nlm.nih.gov/services/nwsSemanticNetwork, I see that
> the wsdl [2] (all WSDLs [3]) is no longer available. For this reason I
> wonder if the API is still supported.
>
> OS: archlinux
> Java: openjdk version "1.8.0_144"
> OpenJDK Runtime Environment (build 1.8.0_144-b01)
> OpenJDK 64-Bit Server VM (build 25.144-b01, mixed mode)
> Network: no-proxy
> UMLS Account: , , 
>
> Regards,
> Alexandru Zbarcea
>
> [1] - https://cwiki.apache.org/confluence/display/CTAKES/
> cTAKES+4.0+Developer+Install+Guide guide
> [2] - https://uts.nlm.nih.gov:443/restful/isValidUMLSUser?wsdl
> [3] - https://uts.nlm.nih.gov/services/nwsSemanticNetwork
>


Re: cTAKES Fast Pipeline Failing

2017-09-01 Thread James Masanz
I think that in late April Sean Finan fixed a problem that was resulting in
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range:
-7

Are you using cTAKES 4.0 (either from the convenience binary download or as
a maven dependency) or are you using cTAKES in some other way

-- James


On Fri, Sep 1, 2017 at 3:13 PM, Michael Trepanier 
wrote:

> Hi All,
>
> We've been attempting to scale our cTAKES Pipeline on top of Spark, so
> we've switched form using the "getDefaultPipeline" method to the
> "getFastPipeline" method to boost the processing speed. However, while the
> default pipeline works fine with Spark, the fast pipeline is throwing the
> below error (edited down to the cTAKES portion of the stack trace):
>
>
> Caused by: org.apache.uima.resource.ResourceInitializationException:
> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
> java.util.PropertyResourceBundle,
> key Could not construct org.apache.ctakes.dictionary.lookup2.dictionary.
> UmlsJdbcRareWordDictionary
> at org.apache.ctakes.dictionary.lookup2.ae.
> AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:131)
> at org.apache.uima.analysis_engine.impl.
> PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(
> PrimitiveAnalysisEngine_impl.java:266)
> ... 44 more
> Caused by: 
> org.apache.uima.analysis_engine.annotator.AnnotatorContextException:
> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
> java.util.PropertyResourceBundle,
> key Could not construct org.apache.ctakes.dictionary.lookup2.dictionary.
> UmlsJdbcRareWordDictionary
> at org.apache.ctakes.dictionary.lookup2.dictionary.
> DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.
> java:199)
> at org.apache.ctakes.dictionary.lookup2.dictionary.
> DictionaryDescriptorParser.parseDictionaries(DictionaryDescriptorParser.
> java:156)
> at org.apache.ctakes.dictionary.lookup2.dictionary.
> DictionaryDescriptorParser.parseDescriptor(DictionaryDescriptorParser.
> java:128)
> at org.apache.ctakes.dictionary.lookup2.ae.
> AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:129)
> ... 45 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.ctakes.dictionary.lookup2.dictionary.
> DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.
> java:196)
> ... 48 more
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -7
> at java.lang.String.substring(String.java:1967)
> at org.apache.ctakes.dictionary.lookup2.util.
> JdbcConnectionFactory.getConnectionUrl(JdbcConnectionFactory.java:110)
> at org.apache.ctakes.dictionary.lookup2.util.
> JdbcConnectionFactory.getConnection(JdbcConnectionFactory.java:63)
> at org.apache.ctakes.dictionary.lookup2.dictionary.
> JdbcRareWordDictionary.(JdbcRareWordDictionary.java:91)
> at org.apache.ctakes.dictionary.lookup2.dictionary.
> JdbcRareWordDictionary.(JdbcRareWordDictionary.java:72)
> at org.apache.ctakes.dictionary.lookup2.dictionary.
> UmlsJdbcRareWordDictionary.(UmlsJdbcRareWordDictionary.java:31)
> ... 53 more
>
>
> So, looking in "getConnectionUrl," we have this method:
>
> static private String getConnectionUrl( final String jdbcUrl ) throws
> SQLException {
>   final String urlDbPath = jdbcUrl.substring(
> HSQL_FILE_PREFIX.length() );
>   final String urlFilePath = urlDbPath + HSQL_DB_EXT;
>   try {
>  final URL url = FileLocator.getResource( urlFilePath );
>  final String urlString = url.toExternalForm();
>  return urlString.substring( 0, urlString.length() -
> HSQL_DB_EXT.length() ); // <---
>   } catch ( FileNotFoundException fnfE ) {
>  throw new SQLException( "No Hsql DB exists at Url", fnfE );
>   }
>
> The substring method indicated above appears to be what is causing the
> error - for some reason the "urlString" variable has a length of zero. This
> seems to indicate that there is something wrong with the cTAKES resources.
> However, that isn't making much sense to me as the default pipeline, which
> also relies on the resources package, is working fine. Has anyone
> encountered something like this before? Does the fast pipeline require some
> additional resources?
>
> As well, for the Spark implementation, we've put the cTAKES jars and
> resources on each executor at the same location, and are specifying this in
> on the executor classpath.
>
> Thanks,
>
> Mike
> --
> [image: MetiStream Logo - 500]
> Mike Tr

Re: Problem running ctakes

2017-08-28 Thread James Masanz
I updated the guide. Thanks.


On Aug 28, 2017 10:01 AM, "SHREY GUPTA"  wrote:

> It worked. Thanks
>
> Shrey Gupta
> MTech Scholar
> IIIT Delhi,
> Okhla Phase 3, New Delhi - 110020
> Mob: +91 9873948565 <+91%2098739%2048565>
>
> On Sun, Aug 27, 2017 at 1:12 AM, David Kincaid 
> wrote:
>
>> Looks like the install guide command is missing the "-desc" options in
>> the command line. Try this:
>>
>> bin/runctakesCVD.sh -desc desc/ctakes-clinical-pipeline/
>> desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml
>>
>> On Sat, Aug 26, 2017 at 1:45 PM, SHREY GUPTA 
>> wrote:
>>
>>> Hi
>>>
>>> I have followed the steps as written in this ctakes install guide
>>> .
>>> I am having troubles when I run it. I am running it on Ubuntu 16.
>>> The following error is shown.
>>>
>>> Regards
>>> Shrey Gupta
>>> MTech Scholar
>>> IIIT Delhi,
>>> Okhla Phase 3, New Delhi - 110020
>>> Mob: +91 9873948565 <+91%2098739%2048565>
>>>
>>
>>
>


Re: cTakesHsql.xml file missing in cTAKES 4.0 ?

2017-05-03 Thread James Masanz
I'll update the documentation page. thanks!

On Tue, May 2, 2017 at 11:34 PM, Narayan Bhat 
wrote:

> James
>
> cTAKES 4.0 - Fast Dictionary lookup section of the document has reference
> to the file I mentioned. Also, to leverage the custom dictionary the
> document says we need to point to custom dictionary XML file using fast
> dictionary parameter LookupXml but how do we set this parameter?
>
> I tried creating a custom dictionary selecting only SNOMED codes. The
> custom dictionary XML file got created under apache-ctakes-4.0.0\resources\
> org\apache\ctakes\dictionary\lookup\fast directory but the document says
> that the XML file would be created under the directory with the same
> name(DIctionryNamed) as custom dictionary file i.e org/apache/ctakes/
> dictionary/lookup/fast/DictionaryName/DictionaryName.xml
>
>
> It will be great if you can clarify.
>
> Thanks and Regards
> Narayan
>
> On Wed, May 3, 2017 at 1:01 AM, James Masanz 
> wrote:
>
>> No, the cTakesHsql.xml should not be needed for 4.0
>>
>> If you create their own dictionary then your own custom_name.xml is
>> automatically generated and placed in the correct resources/ directory.
>>
>> Is there 4.0 documentation telling you to update that file, or are you
>> just going from what you did with a previous release?
>>
>>
>>
>> On Tue, May 2, 2017 at 1:01 AM, Narayan Bhat > > wrote:
>>
>>>
>>> Hi
>>>
>>> The resource configuration file cTakesHsql.xml is not
>>> in resources/org/apache/ctakes/dictionary/lookup/fast directory (in
>>> cTAKES 4.0). Should I reinstall cTAKES 4.0?
>>>
>>>
>>> --
>>> Thanks and Regards
>>> Narayan L Bhat
>>> +91 9740237967 <+91%2097402%2037967>
>>>
>>
>>
>
>
> --
> Thanks and Regards
> Narayan L Bhat
> +91 9740237967 <+91%2097402%2037967>
>


Re: cTakesHsql.xml file missing in cTAKES 4.0 ?

2017-05-02 Thread James Masanz
No, the cTakesHsql.xml should not be needed for 4.0

If you create their own dictionary then your own custom_name.xml is
automatically generated and placed in the correct resources/ directory.

Is there 4.0 documentation telling you to update that file, or are you just
going from what you did with a previous release?



On Tue, May 2, 2017 at 1:01 AM, Narayan Bhat 
wrote:

>
> Hi
>
> The resource configuration file cTakesHsql.xml is not
> in resources/org/apache/ctakes/dictionary/lookup/fast directory (in
> cTAKES 4.0). Should I reinstall cTAKES 4.0?
>
>
> --
> Thanks and Regards
> Narayan L Bhat
> +91 9740237967 <+91%2097402%2037967>
>


[ANNOUNCE] Re: cTAKES 4.0.0 Release

2017-04-25 Thread James Masanz
While there are links on the downloads page to the documentation on the
cTAKES wiki, if anyone wants the direct link to the cTAKES 4.0 wiki, it is:

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0

Also adding [ANNOUNCE] to the subject line for anyone filtering on that.


On Mon, Apr 24, 2017 at 9:02 AM, Murali Minnah  wrote:

> The Apache cTAKES team is pleased to announce the availability of the
> 4.0.0 release.
>
> For the complete release notes, please visit
> https://s.apache.org/ctakes-4.0.0-release-notes
>
> Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) is
> an open-source natural language processing system for information
> extraction from electronic medical record clinical free-text.
>
> The release can be downloaded from
> http://ctakes.apache.org/downloads.cgi
>
> For further information, please visit the project website at
> http://ctakes.apache.org/
>
> -- The Apache cTAKES Team
>


Re: cTAKES 4.0.0 Release

2017-04-25 Thread James Masanz
Thanks Tim for noticing. I've created the missing page

On Tue, Apr 25, 2017 at 7:17 AM, Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> ​Looks like the link to the "Dictionary Install Guide" goes to a renamed
> wiki page. And the page is for the old dictionary. Should we just remove
> that link?
>
> Tim
>
>
>
>
> --
> *From:* James Masanz 
> *Sent:* Monday, April 24, 2017 12:18 PM
> *To:* user@ctakes.apache.org
> *Subject:* Re: cTAKES 4.0.0 Release
>
>
> Links on the download page should all be updated now.  Sorry for the
> inconvenience.
>
> On Mon, Apr 24, 2017 at 11:59 AM, James Masanz 
> wrote:
>
>> We're updating that now.
>> should be fixed in a few minutes.
>>
>> On Mon, Apr 24, 2017 at 11:56 AM, Lin, Chen <
>> chen@childrens.harvard.edu> wrote:
>>
>>> Dear cTAKES Team,
>>>
>>> Forgive me if I am wrong. I tried the ³release can be download from² link
>>> and it linked me to the download page of cTAKES, which is still at
>>> version
>>> 3.2.2?
>>> I tried ³User Installation² and ³Source Code² links, both were pointing
>>> to
>>> 3.2.2. Please help double check. Many thanks!
>>>
>>> Best,
>>> Chen
>>>
>>> On 4/24/17, 9:02 AM, "Murali Minnah"  wrote:
>>>
>>> >The Apache cTAKES team is pleased to announce the availability of the
>>> >4.0.0 release.
>>> >
>>> >For the complete release notes, please visit
>>> >https://urldefense.proofpoint.com/v2/url?u=https-3A__s.apac
>>> he.org_ctakes-2
>>> >D4.0.0-2Drelease-2Dnotes&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhK
>>> wEW14JZMSdioCop
>>> >pxeFU&r=PZ241CwYZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP
>>> 6RQBwVmIxqik2oo
>>> >9RNeUCs8LnoehYI--9myIuI&s=3ItGRigq2zUxMDfEBk2-fKT2rwHtu7Qkf
>>> uPv1MxgTo8&e=
>>> >
>>> >Apache clinical Text Analysis and Knowledge Extraction System (cTAKES)
>>> is
>>> >an open-source natural language processing system for information
>>> >extraction from electronic medical record clinical free-text.
>>> >
>>> >The release can be downloaded from
>>> >https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.
>>> apache.org_down
>>> >loads.cgi&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCop
>>> pxeFU&r=PZ241Cw
>>> >YZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP6RQBwVmIxqik2oo
>>> 9RNeUCs8LnoehYI
>>> >--9myIuI&s=vcGx77tDkGzacKFboc1ejDzOES5GgRDGpK0YZVXNATU&e=
>>> >
>>> >For further information, please visit the project website at
>>> >https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.
>>> apache.org_&d=D
>>> >wIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=PZ241
>>> CwYZ3AszaTEBtM2
>>> >wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP6RQBwVmIxqik2oo9RNeUCs8Lnoeh
>>> YI--9myIuI&s=tX
>>> >P9NXamRiOw7mP1_eYxUBLvh0VG73QYruEJDKPmu3k&e=
>>> >
>>> >-- The Apache cTAKES Team
>>>
>>>
>>
>


Re: cTAKES 4.0.0 Release

2017-04-25 Thread James Masanz
I'll create a Wiki page that references people to the dictionary GUI and
that suggests people use the Fast Dictionary Lookup. and add a link from
there to the Original Dictionary Lookup
<https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Original+Dictionary+Lookup>
page
too just in case someone is look for that.

This will at least prevent people from getting the Page Not Found - if
people don't like this approach and remove or update the link, I'd be fine
with that.

On Tue, Apr 25, 2017 at 7:17 AM, Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> ​Looks like the link to the "Dictionary Install Guide" goes to a renamed
> wiki page. And the page is for the old dictionary. Should we just remove
> that link?
>
> Tim
>
>
>
>
> --
> *From:* James Masanz 
> *Sent:* Monday, April 24, 2017 12:18 PM
> *To:* user@ctakes.apache.org
> *Subject:* Re: cTAKES 4.0.0 Release
>
>
> Links on the download page should all be updated now.  Sorry for the
> inconvenience.
>
> On Mon, Apr 24, 2017 at 11:59 AM, James Masanz 
> wrote:
>
>> We're updating that now.
>> should be fixed in a few minutes.
>>
>> On Mon, Apr 24, 2017 at 11:56 AM, Lin, Chen <
>> chen@childrens.harvard.edu> wrote:
>>
>>> Dear cTAKES Team,
>>>
>>> Forgive me if I am wrong. I tried the ³release can be download from² link
>>> and it linked me to the download page of cTAKES, which is still at
>>> version
>>> 3.2.2?
>>> I tried ³User Installation² and ³Source Code² links, both were pointing
>>> to
>>> 3.2.2. Please help double check. Many thanks!
>>>
>>> Best,
>>> Chen
>>>
>>> On 4/24/17, 9:02 AM, "Murali Minnah"  wrote:
>>>
>>> >The Apache cTAKES team is pleased to announce the availability of the
>>> >4.0.0 release.
>>> >
>>> >For the complete release notes, please visit
>>> >https://urldefense.proofpoint.com/v2/url?u=https-3A__s.apac
>>> he.org_ctakes-2
>>> >D4.0.0-2Drelease-2Dnotes&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhK
>>> wEW14JZMSdioCop
>>> >pxeFU&r=PZ241CwYZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP
>>> 6RQBwVmIxqik2oo
>>> >9RNeUCs8LnoehYI--9myIuI&s=3ItGRigq2zUxMDfEBk2-fKT2rwHtu7Qkf
>>> uPv1MxgTo8&e=
>>> >
>>> >Apache clinical Text Analysis and Knowledge Extraction System (cTAKES)
>>> is
>>> >an open-source natural language processing system for information
>>> >extraction from electronic medical record clinical free-text.
>>> >
>>> >The release can be downloaded from
>>> >https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.
>>> apache.org_down
>>> >loads.cgi&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCop
>>> pxeFU&r=PZ241Cw
>>> >YZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP6RQBwVmIxqik2oo
>>> 9RNeUCs8LnoehYI
>>> >--9myIuI&s=vcGx77tDkGzacKFboc1ejDzOES5GgRDGpK0YZVXNATU&e=
>>> >
>>> >For further information, please visit the project website at
>>> >https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.
>>> apache.org_&d=D
>>> >wIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=PZ241
>>> CwYZ3AszaTEBtM2
>>> >wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP6RQBwVmIxqik2oo9RNeUCs8Lnoeh
>>> YI--9myIuI&s=tX
>>> >P9NXamRiOw7mP1_eYxUBLvh0VG73QYruEJDKPmu3k&e=
>>> >
>>> >-- The Apache cTAKES Team
>>>
>>>
>>
>


Re: cTAKES 4.0.0 Release

2017-04-24 Thread James Masanz
Links on the download page should all be updated now.  Sorry for the
inconvenience.

On Mon, Apr 24, 2017 at 11:59 AM, James Masanz 
wrote:

> We're updating that now.
> should be fixed in a few minutes.
>
> On Mon, Apr 24, 2017 at 11:56 AM, Lin, Chen  edu> wrote:
>
>> Dear cTAKES Team,
>>
>> Forgive me if I am wrong. I tried the ³release can be download from² link
>> and it linked me to the download page of cTAKES, which is still at version
>> 3.2.2?
>> I tried ³User Installation² and ³Source Code² links, both were pointing to
>> 3.2.2. Please help double check. Many thanks!
>>
>> Best,
>> Chen
>>
>> On 4/24/17, 9:02 AM, "Murali Minnah"  wrote:
>>
>> >The Apache cTAKES team is pleased to announce the availability of the
>> >4.0.0 release.
>> >
>> >For the complete release notes, please visit
>> >https://urldefense.proofpoint.com/v2/url?u=https-3A__s.
>> apache.org_ctakes-2
>> >D4.0.0-2Drelease-2Dnotes&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhK
>> wEW14JZMSdioCop
>> >pxeFU&r=PZ241CwYZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP
>> 6RQBwVmIxqik2oo
>> >9RNeUCs8LnoehYI--9myIuI&s=3ItGRigq2zUxMDfEBk2-fKT2rwHtu7QkfuPv1MxgTo8&e=
>> >
>> >Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) is
>> >an open-source natural language processing system for information
>> >extraction from electronic medical record clinical free-text.
>> >
>> >The release can be downloaded from
>> >https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.
>> apache.org_down
>> >loads.cgi&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCop
>> pxeFU&r=PZ241Cw
>> >YZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP6RQBwVmIxqik2oo
>> 9RNeUCs8LnoehYI
>> >--9myIuI&s=vcGx77tDkGzacKFboc1ejDzOES5GgRDGpK0YZVXNATU&e=
>> >
>> >For further information, please visit the project website at
>> >https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.
>> apache.org_&d=D
>> >wIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=PZ241
>> CwYZ3AszaTEBtM2
>> >wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP6RQBwVmIxqik2oo9RNeUCs8Lnoeh
>> YI--9myIuI&s=tX
>> >P9NXamRiOw7mP1_eYxUBLvh0VG73QYruEJDKPmu3k&e=
>> >
>> >-- The Apache cTAKES Team
>>
>>
>


Re: cTAKES 4.0.0 Release

2017-04-24 Thread James Masanz
We're updating that now.
should be fixed in a few minutes.

On Mon, Apr 24, 2017 at 11:56 AM, Lin, Chen 
wrote:

> Dear cTAKES Team,
>
> Forgive me if I am wrong. I tried the ³release can be download from² link
> and it linked me to the download page of cTAKES, which is still at version
> 3.2.2?
> I tried ³User Installation² and ³Source Code² links, both were pointing to
> 3.2.2. Please help double check. Many thanks!
>
> Best,
> Chen
>
> On 4/24/17, 9:02 AM, "Murali Minnah"  wrote:
>
> >The Apache cTAKES team is pleased to announce the availability of the
> >4.0.0 release.
> >
> >For the complete release notes, please visit
> >https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__s.apache.org_ctakes-2
> >D4.0.0-2Drelease-2Dnotes&d=DwIBaQ&c=qS4goWBT7poplM69zy_
> 3xhKwEW14JZMSdioCop
> >pxeFU&r=PZ241CwYZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=
> A2FzP6RQBwVmIxqik2oo
> >9RNeUCs8LnoehYI--9myIuI&s=3ItGRigq2zUxMDfEBk2-fKT2rwHtu7QkfuPv1MxgTo8&e=
> >
> >Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) is
> >an open-source natural language processing system for information
> >extraction from electronic medical record clinical free-text.
> >
> >The release can be downloaded from
> >https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__ctakes.apache.org_down
> >loads.cgi&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> PZ241Cw
> >YZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP6RQBwVmIxqik2oo9RNeUCs8Ln
> oehYI
> >--9myIuI&s=vcGx77tDkGzacKFboc1ejDzOES5GgRDGpK0YZVXNATU&e=
> >
> >For further information, please visit the project website at
> >https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__ctakes.apache.org_&d=D
> >wIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> PZ241CwYZ3AszaTEBtM2
> >wl3EcIjNNNeKX8q7N_mt-aI&m=A2FzP6RQBwVmIxqik2oo9RNeUCs8Ln
> oehYI--9myIuI&s=tX
> >P9NXamRiOw7mP1_eYxUBLvh0VG73QYruEJDKPmu3k&e=
> >
> >-- The Apache cTAKES Team
>
>


Re: Customized dictionary

2017-04-21 Thread James Masanz
These are more aimed at developers, but in case this is helpful:

There is a brief overview of using a bar separated file here:

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+Developer+Install+Guide?preview=/44302415/68718471/cTAKES.Hackphlet.3.2.2.pdf

and a related email thread here:
http://markmail.org/message/vtjchjyyre73p2s7

Hope that helps,
James

On Thu, Apr 20, 2017 at 10:03 AM, Nikolas Pontikos 
wrote:

> Yes that would be great.  I also would like to remap abbreviations.  For
> example VA from Venous Aneurysms to Visual Acuity.
>
>
> On Thu, 20 Apr 2017 at 14:37 Sherri Matis Mitchell <
> sherr...@datastarinsights.com> wrote:
>
>> I would also be interested in this. I am working with a lot of findings
>> that aren’t in UMLS so I need to add my own dictionary with sematic types.
>>
>>
>>
>> Sherri
>>
>>
>>
>> *From:* Fardad Gharghabi [mailto:fghar...@jhu.edu]
>> *Sent:* Monday, April 3, 2017 10:44 AM
>> *To:* user@ctakes.apache.org
>> *Subject:* Customized dictionary
>>
>>
>>
>> Hello,
>>
>> Really appreciate if I can get some information on how to create own
>> dictionaries based on local phrases.
>>
>> Any webpage or document that can explain step by step how to create new
>> dictionary, and how to run it.
>>
>>
>>
>> Thanks,
>>
>> Farhad
>>
>


Re: FW: demo server not working

2017-03-31 Thread James Masanz
Sorry, I was fooled because multiple word examples work if you change the
dropdown from Pretty Print to something else.
But Pretty Print is broken for me too for multiple-word entries.
Hopefully someone else on this list knows more about those demos.


On Fri, Mar 31, 2017 at 3:38 PM, Alden Gordon  wrote:

> It looks like submitting any two terms separated by whitespace raises the
> exception.
>
> On Fri, Mar 31, 2017 at 3:13 PM, James Masanz 
> wrote:
>
>> if I follow either of the links [1,2] on that page and just hit the
>> submit button, I get an exception, but if I replace the text in the text
>> box with some text, even text as simple as just the word pain, they both
>> seem to work fine for me.  Do they work for you if you enter some text?
>>
>> [1] http://52.26.219.218:8080/index.jsp
>> [2] http://52.27.22.206:8080/index.jsp
>>
>> On Fri, Mar 31, 2017 at 2:21 PM, Sherri Matis Mitchell <
>> sherr...@datastarinsights.com> wrote:
>>
>>>
>>>
>>>
>>>
>>> Demos at http://healthnlp.github.io/examples/ are not working
>>>
>>>
>>>
>>> Sherri
>>>
>>>
>>>
>>
>>
>
>
> --
> Alden Gordon
> Analytics Lead | 860.402.6572 <(860)%20402-6572> | al...@rubiconmd.com
> 
> Visit Care Without Constraints
> <https://www.linkedin.com/company/care-without-constraints> to learn how
> eConsults are removing barriers to care.
>


Re: FW: demo server not working

2017-03-31 Thread James Masanz
if I follow either of the links [1,2] on that page and just hit the submit
button, I get an exception, but if I replace the text in the text box with
some text, even text as simple as just the word pain, they both seem to
work fine for me.  Do they work for you if you enter some text?

[1] http://52.26.219.218:8080/index.jsp
[2] http://52.27.22.206:8080/index.jsp

On Fri, Mar 31, 2017 at 2:21 PM, Sherri Matis Mitchell <
sherr...@datastarinsights.com> wrote:

>
>
>
>
> Demos at http://healthnlp.github.io/examples/ are not working
>
>
>
> Sherri
>
>
>


Re: As promised

2017-03-25 Thread James Masanz
John,

thanks! The blog post mentions is "only compatible with Java 7"  Does that
mean literally only 7, or does it mean "7 or later"?
If it means only exactly 7, do you remember which piece(s) have that
requirement?

-- James

On Sat, Mar 25, 2017 at 3:38 PM, John Green 
wrote:

> A very brief instruction set for setting up cTakes as a RESTful server on
> a VirtualBox - http://cyborgdoctor.blogspot.com/
>


Re: detection of chemical/drug entities

2017-03-23 Thread James Masanz
Sorry, parts of my previous reply was incorrect.

I forgot that as of 3.1, the annotations made by the drugner component are
converted into org.apache.ctakes.typesystem.type.textsem.MedicationMention
annotations. So the AggregatePlaintextFastUMLSProcessor is working as
expected -- instead of looking for medications as annotations of type
 org.apache.ctakes.drugner.type.* use org.apache.ctakes.typesystem.
type.textsem.MedicationMention.

As I mentioned before, whether nucleotide is classified as a medication
depends on the dictionary used. It appears nucleotide is not within the
default dictionary as a medication. However I do see annotations being
created for glycine and aspartic acid.

org.apache.ctakes.typesystem.type.textsem.MedicationMention is found under
org.apache.ctakes.typesystem.type.textsem.EventMention which is found under
org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation

To get attributes such as strength, dose, etc, a MedicationMention
annotation has attributes such as medicationStrength which can reference an
annotation of type MedicationStrengthModifier

Regards,
-- James

On Thu, Mar 23, 2017 at 9:35 AM, MAZE, Christian 
wrote:

> Hi,
>
>
>
> I am using the v3.2.2 ctakes.
>
> I copied the v3.2.1.1 resources.
>
>
>
> I used another tool for extracting chemical/drug entities and when
> processing my input text, I got the following results:
>
> 1092   1107   nucleotide 1517
>
> 1165   1178   aspartic acid
>
> 1184   1191   glycine
>
> 1195   1204   codon 506
>
> 1254   1257   Asp
>
>
>
> My input test text is :
>
> A novel missense mutation Asp506Gly in Exon 13 of the F11 gene in an
> asymptomatic Korean woman with mild factor XI deficiency. Factor XI (FXI)
> deficiency is a rare autosomal recessive coagulation disorder most commonly
> found in Ashkenazi and Iraqi Jews, but it is also found in other ethnic
> groups. It is a trauma or surgery-related bleeding disorder, but
> spontaneous bleeding is rarely seen. The clinical manifestation of bleeding
> in FXI deficiency cases is variable and seems to poorly correlate with
> plasma FXI levels. The molecular pathology of FXI deficiency is mutation in
> the F11 gene on the chromosome band 4q35. We report a novel mutation of the
> F11 gene in an 18-year-old asymptomatic Korean woman with mild FXI
> deficiency. Pre-operative laboratory screen tests for lipoma on her back
> revealed slightly prolonged activated partial thromboplastin time (45.2
> sec; reference range, 23.2-39.4 sec). Her FXI activity (35%) was slightly
> lower than the normal FXI activity (reference range, 50-150%). Direct
> sequence analysis of the F11 gene revealed a heterozygous A to G
> substitution in nucleotide 1517 (c.1517A>G) of exon 13, resulting in the
> substitution of aspartic acid with glycine in codon 506 (p.Asp506Gly). To
> the best of our knowledge, the Asp506Gly is a novel missense mutation, and
> this is the first genetically confirmed case of mild FXI deficiency in
> Korea.
>
>
>
> When I start the same treatment using ctakes and the
> AggregatePlaintextFastUMLSProcessor annotator file.
>
>
>
> No drug entity is detected inside drugner leaf (
>
> For example I would have thought that nucleotide be classified in the
> drugner leaf.
>
>
>
> Please could you correct me if I missed anything ?
>
>
>
> Christian
>
>
> This message contains information that may be privileged or confidential
> and is the property of the Capgemini Group. It is intended only for the
> person to whom it is addressed. If you are not the intended recipient, you
> are not authorized to read, print, retain, copy, disseminate, distribute,
> or use this message or any part thereof. If you receive this message in
> error, please notify the sender immediately and delete all copies of this
> message.
>


Re: detection of chemical/drug entities

2017-03-23 Thread James Masanz
Whether nucleotide is classified as a medication depends on the dictionary
used. It appears nucleotide is not within the default dictionary as a
medication

In order to have drugner annotations,
org.apache.ctakes.drugner.ae.DrugMentionAnnotator
needs to be in the pipeline.
(When using xml descriptors, that's
desc\ctakes-drug-ner\desc\analysis_engine\DrugMentionAnnotator.xml)

However, I see that annotator listed in the AggregatePlaintextFastUMLSProcessor
pipeline. I'll take a look at why the drugner annotations are not being
created.

When I use AggregatePlaintextFastUMLSProcessor in 3.2.2, medications *are*
getting annotated - but they are being annotated as
org.apache.ctakes.typesystem.type.textsem.MedicationEventMention
which is found under
org.apache.ctakes.typesystem.type.textsem.EventMention

hope this helps
-- James




On Thu, Mar 23, 2017 at 9:35 AM, MAZE, Christian 
wrote:

> Hi,
>
>
>
> I am using the v3.2.2 ctakes.
>
> I copied the v3.2.1.1 resources.
>
>
>
> I used another tool for extracting chemical/drug entities and when
> processing my input text, I got the following results:
>
> 1092   1107   nucleotide 1517
>
> 1165   1178   aspartic acid
>
> 1184   1191   glycine
>
> 1195   1204   codon 506
>
> 1254   1257   Asp
>
>
>
> My input test text is :
>
> A novel missense mutation Asp506Gly in Exon 13 of the F11 gene in an
> asymptomatic Korean woman with mild factor XI deficiency. Factor XI (FXI)
> deficiency is a rare autosomal recessive coagulation disorder most commonly
> found in Ashkenazi and Iraqi Jews, but it is also found in other ethnic
> groups. It is a trauma or surgery-related bleeding disorder, but
> spontaneous bleeding is rarely seen. The clinical manifestation of bleeding
> in FXI deficiency cases is variable and seems to poorly correlate with
> plasma FXI levels. The molecular pathology of FXI deficiency is mutation in
> the F11 gene on the chromosome band 4q35. We report a novel mutation of the
> F11 gene in an 18-year-old asymptomatic Korean woman with mild FXI
> deficiency. Pre-operative laboratory screen tests for lipoma on her back
> revealed slightly prolonged activated partial thromboplastin time (45.2
> sec; reference range, 23.2-39.4 sec). Her FXI activity (35%) was slightly
> lower than the normal FXI activity (reference range, 50-150%). Direct
> sequence analysis of the F11 gene revealed a heterozygous A to G
> substitution in nucleotide 1517 (c.1517A>G) of exon 13, resulting in the
> substitution of aspartic acid with glycine in codon 506 (p.Asp506Gly). To
> the best of our knowledge, the Asp506Gly is a novel missense mutation, and
> this is the first genetically confirmed case of mild FXI deficiency in
> Korea.
>
>
>
> When I start the same treatment using ctakes and the
> AggregatePlaintextFastUMLSProcessor annotator file.
>
>
>
> No drug entity is detected inside drugner leaf (
>
> For example I would have thought that nucleotide be classified in the
> drugner leaf.
>
>
>
> Please could you correct me if I missed anything ?
>
>
>
> Christian
>
>
> This message contains information that may be privileged or confidential
> and is the property of the Capgemini Group. It is intended only for the
> person to whom it is addressed. If you are not the intended recipient, you
> are not authorized to read, print, retain, copy, disseminate, distribute,
> or use this message or any part thereof. If you receive this message in
> error, please notify the sender immediately and delete all copies of this
> message.
>


Re: Subscription

2017-03-19 Thread James Masanz
people can subscribe themselves by sending a plaintext (not html format)
email to the appropriate *-subscr...@ctakes.apache.org email address(es) as
described here:


https://ctakes.apache.org/mailing-lists.html


On Fri, Mar 17, 2017 at 6:14 AM, Hari, Sekhar  wrote:

> Hi there -
>
> Can you kindly subscribe the following email IDs in these groups? Many
> thanks.
>
> monika.rajam...@cgi.com
> saroj.pa...@cgi.com
>
> Thanks
> Sekhar Hari | Program Lead
> Health Sciences Business Innovation
> ASDC CGI Health Solutions
> Electronic City, Bangalore
> Karnataka, India 560100
>
> 814 7027 779 (C)
> 080 6642 2536 (D)
>


Re: Named entity recognition

2017-03-09 Thread James Masanz
I've posted the pamphlet/manual to the Wiki for 3.2.2.
https://cwiki.apache.org/confluence/download/attachments/44302415/cTAKES.Hackphlet.3.2.2.pdf


On Thu, Mar 9, 2017 at 11:50 AM, James Masanz 
wrote:

> I'm making some minor changes to the pamphlet Guergana mentioned and will
> send it out or post it later today.
>
> On Wed, Mar 8, 2017 at 5:03 PM, Savova, Guergana <
> guergana.sav...@childrens.harvard.edu> wrote:
>
>> Yes, we have a guide (aka pamphlet) with a description of the cTAKES
>> basics. We will be distributing it with the 4.0 release targeted at the end
>> of the month.
>>
>>
>>
>> Sean Finan might be able to distribute the pamphlet now…
>>
>> --Guergana
>>
>>
>>
>>
>>
>> *From:* Kevin B. Cohen [mailto:kevin.co...@gmail.com]
>> *Sent:* Wednesday, March 8, 2017 4:27 PM
>> *To:* user@ctakes.apache.org
>> *Subject:* Re: Named entity recognition
>>
>>
>>
>> Alden, if you find such a beginner's guide and could distribute its
>> whereabouts to the rest of us, it would be great.
>>
>>
>>
>>
>>
>> Kevin
>>
>>
>>
>> On Tue, Mar 7, 2017 at 6:39 PM, Alden Gordon  wrote:
>>
>> Does anyone have a beginners guide to applying cTAKES named entity
>> recognition to a corpus?
>>
>>
>>
>> I have used the aggregate plain text processor with the collection
>> process engine on my data (the text of a PCP - specialist curbside
>> consult), but it seems to only capture capitalization and part of speech. I
>> would like to capture negation and assign relevant words to SNOMED
>> concepts.
>>
>>
>>
>> Thank you in advance for any general guidance.
>>
>>
>>
>> Additionally, if anyone has any advice on using cTAKES with python, I
>> would appreciate the help
>>
>>
>>
>> Best,
>>
>> Alden
>>
>>
>>
>> --
>>
>> Alden Gordon
>>
>> Analytics Lead | 860.402.6572 <(860)%20402-6572> | al...@rubiconmd.com
>> 
>>
>> Visit Care Without Constraints
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_care-2Dwithout-2Dconstraints&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=OZ0N_GegzE8gDXxlcpUKEvrV1iAtsotQO7Z3Ot-D8JE&s=CJVDbKQaW1d7C5KK1F35DE_m0isU_nHdg1qEF1sK2UQ&e=>
>>  to learn how eConsults are removing barriers to care.
>>
>>
>>
>>
>>
>> --
>>
>> Kevin Bretonnel Cohen, PhD
>> Director, Biomedical Text Mining Group
>> Computational Bioscience Program, U. Colorado School of Medicine
>>
>> Chair in Natural Language Processing for the Biomedical Domain
>>
>> Université Paris-Saclay, LIMSI-CNRS
>>
>> 303-916-2417 <(303)%20916-2417>
>> http://compbio.ucdenver.edu/Hunter_lab/Cohen
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__compbio.ucdenver.edu_Hunter-5Flab_Cohen&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=OZ0N_GegzE8gDXxlcpUKEvrV1iAtsotQO7Z3Ot-D8JE&s=cp9w6ACOoIHwOd5QuRWY9fkfKlM6gP1EEoKa52_XDbg&e=>
>>
>>
>>
>


Re: Named entity recognition

2017-03-09 Thread James Masanz
I'm making some minor changes to the pamphlet Guergana mentioned and will
send it out or post it later today.

On Wed, Mar 8, 2017 at 5:03 PM, Savova, Guergana <
guergana.sav...@childrens.harvard.edu> wrote:

> Yes, we have a guide (aka pamphlet) with a description of the cTAKES
> basics. We will be distributing it with the 4.0 release targeted at the end
> of the month.
>
>
>
> Sean Finan might be able to distribute the pamphlet now…
>
> --Guergana
>
>
>
>
>
> *From:* Kevin B. Cohen [mailto:kevin.co...@gmail.com]
> *Sent:* Wednesday, March 8, 2017 4:27 PM
> *To:* user@ctakes.apache.org
> *Subject:* Re: Named entity recognition
>
>
>
> Alden, if you find such a beginner's guide and could distribute its
> whereabouts to the rest of us, it would be great.
>
>
>
>
>
> Kevin
>
>
>
> On Tue, Mar 7, 2017 at 6:39 PM, Alden Gordon  wrote:
>
> Does anyone have a beginners guide to applying cTAKES named entity
> recognition to a corpus?
>
>
>
> I have used the aggregate plain text processor with the collection process
> engine on my data (the text of a PCP - specialist curbside consult), but it
> seems to only capture capitalization and part of speech. I would like to
> capture negation and assign relevant words to SNOMED concepts.
>
>
>
> Thank you in advance for any general guidance.
>
>
>
> Additionally, if anyone has any advice on using cTAKES with python, I
> would appreciate the help
>
>
>
> Best,
>
> Alden
>
>
>
> --
>
> Alden Gordon
>
> Analytics Lead | 860.402.6572 <(860)%20402-6572> | al...@rubiconmd.com
> 
>
> Visit Care Without Constraints
> 
>  to learn how eConsults are removing barriers to care.
>
>
>
>
>
> --
>
> Kevin Bretonnel Cohen, PhD
> Director, Biomedical Text Mining Group
> Computational Bioscience Program, U. Colorado School of Medicine
>
> Chair in Natural Language Processing for the Biomedical Domain
>
> Université Paris-Saclay, LIMSI-CNRS
>
> 303-916-2417 <(303)%20916-2417>
> http://compbio.ucdenver.edu/Hunter_lab/Cohen
> 
>
>
>


Fwd: wiki wishlist

2017-03-01 Thread James Masanz
I suggest we use this thread + a JIRA item to compile a list of wiki
changes people would like - formatting, content, anything to do with
updating the cTAKES wiki, which is
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES

I'll start the list with these items:
   - make the sidebar show the most recent cTAKES release at the top
(reverse chronological order) (Done - Just did it!)
   - incorporate any comments made within the Wiki, such as this one
https://cwiki.apache.org/confluence/display/CTAKES/
cTAKES+3.0+User+Install+Guide?focusedCommentId=34013875#comment-34013875

To add to the wish list, either comment on CTAKES-420
 or reply to this thread.

Thanks!
James


[ANNOUNCE] Apache cTAKES 3.1.1 released

2013-12-05 Thread James Masanz
 The Apache cTAKES team is pleased to announce the availability of the
3.1.1 release.


For the complete release notes, please visit
http://s.apache.org/ctakes-3.1.1-release-notes


Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) is
an open-source natural language processing system for information
extraction from electronic medical record clinical free-text.

The release can be downloaded from
http://ctakes.apache.org/downloads.cgi

For further information, please visit the project website at
http://ctakes.apache.org/

-- The Apache cTAKES Team


[ANNOUNCE] Apache cTAKES 3.1.0 released

2013-09-16 Thread James Masanz
The Apache cTAKES team is pleased to announce the availability of the 3.1.0
release.

This release is our first release as a top level project.

For the complete release notes, please visit
http://s.apache.org/ctakes-3.1.0-release-notes

Apache clinical Text Analysis and Knowledge Extraction System (cTAKES)
is an open-source natural language processing system for information
extraction
from electronic medical record clinical free-text.

The release can be downloaded from:
http://ctakes.apache.org/downloads.cgi

For further information, please visit the project website at
http://ctakes.apache.org/

-- The Apache cTAKES Team