Re: cTAKES Pipeline [EXTERNAL]

Finan, Sean Mon, 22 Jul 2019 07:25:35 -0700

Hi Siamak,

Just to clarify, $CTAKES_HOME isn't required by ctakes itself.  It is in the 
bin/ scripts just to make the java command at the end of the script more 
concise.  
The script attempts to set it to the directory in which you installed a binary 
installation of ctakes.  
The $CTAKES_HOME directory should contain the ctakes directories bin/ lib/ 
resources/ ...


Also, please be aware that those bin/ scripts are not meant to be used with a 
developer installation.  
The scripts are meant to be used with a built binary installation of ctakes - 
one with bin/ lib/ resources/ ... subdirectories all in one [ctakes home, 
ctakes root] directory.   
A developer [root] directory has subdirectories  ctakes-core/ ctakes-core-res/ 
ctakes-clinical-pipeline/ ctakes-clinical-pipeline-res/ ...

Can I assume, since you are writing java code (using PipelineBuilder), that you 
have some experience with java?
You can put PipelineBuilder in any main(..) method and then start that main(..) 
from a command line just as you would any other java program.  Just like any 
other java program, you need to have your $CLASSPATH set correctly and, for 
memory use, increase your maximum memory with -Xmx .  These are VM options.

The same goes for running the org.apache.uima.tools.cvd.CVD .  It is just 
another regular old java class with a main(..) method.

To use the CVD to inspect output you just need to make sure that you produce 
XMI files.  With the PipelineBuilder this is really easy.
After adding the AEs to the pipeline, just put .writeXmis( myOutputDirectory ) 
at the end of your builder.

new PipelineBuilder()
.readFiles( myInDir )
.add( SimpleSegmentAnnotator.class )     for instance
.add( etc. )
.writeXmis( myOutDir )
.run();

Or some variation of that.  See the ctakes-examples module for more, for 
instance org.apache.ctakes.examples.pipeline    HelloWorldBuilderRunner

-- If you only need to do anything special with the actual pipeline (e.g. 
special ae interfaces, uima control), after it has been built, then I recommend 
that you use the PiperFileReader and a piper file.  A piper file helps make the 
pipeline more transparent (outside the code), and allows modification of the 
pipeline without recompiling (and reinstalling).
Then you can get the PipelineBuilder from the PiperFileReader.
See org.apache.ctakes.examples.pipeline    HelloWorldPiperRunner

PiperFileReader reader = new PiperFileReader( myPiperFilePath );
PipelineBuilder builder = reader.getBuilder();

--- If you don't need to do anything special with the pipeline then I recommend 
that you just use the PiperFileRunner, which does everything for you.

java [VM options] org.apache.ctakes.core.pipeline.PiperFileRunner -p 
myPiperFilePath -i myInDir --writeXmis myOutDir

One final note:  CVD is not a ctakes tool and ctakes itself does not contain 
CVD code.  The CVD is an Apache UIMA product, and help for can be found online. 
 
https://uima.apache.org/d/uimaj-current/tools.html

runctakesCVD is a convenience script, and its name may be a little misleading.  
Something like runUimaCvd might cause less confusion; let us know your thoughts 
on changing the name.  For instance, would a different name make looking for 
help faster or easier?

Sean

________________________________________
From: Siamak Barzegar <[email protected]>
Sent: Monday, July 22, 2019 4:38 AM
To: [email protected]
Subject: Re: cTAKES Pipeline [EXTERNAL]

Hi Sean,

I have the same question (I want to run runctakesCVD or ..CPE on my
modified codes, descriptors - not using an IDE, so what I should set my
CTAKES_HOME variable into?). I do not use the Piper Gui. I am
using PipelineBuilder on the source code.
It is important to see the results by runctakesCVD as well. But what I
should set my CTAKES_HOME variable into in runctakesCVD.sh file?
java -cp $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/lib/*
-Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
org.apache.uima.tools.cvd.CVD "$@"

With Best Wishes,
Siamak

On Fri, 19 Jul 2019 at 20:12, Maral Amir <[email protected]> wrote:

> Hi Sean,
>
> Thank you so much for your insightful response.
> I'm having a problem linking the piper files. I should mention I am using
> command line interface. Could you please kindly let me know:
>
> 1. What I should set my CTAKES_HOME variable into. Right now I set my
> CTAKES_HOME to my cTAKES user installation main folder. That is because I
> could see in the last line of the runPiperFile.sh, the class directory
> $CTAKES_HOME/lib/* is included and no /lib folder is present in the
> developer's version.
>
> java -cp
>
> $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/*
> -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
> org.apache.ctakes.core.pipeline.PiperFileRunner "$@"
>
>
> Also,
>
> 2. Where is the *"bin"* folder where the bash file resides. Right now I use
> this one :
> /Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin
>
>
> Thanks,
> Maral
>
> On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean <
> [email protected]> wrote:
>
> > Hi Maral,
> >
> > You can generate different output types by adding different writers to
> the
> > end of the pipeline.
> > Here are the contents of the Default Clinical Pipeline piper file:
> >
> >
> >
> ========================================================================================
> > // Commands and parameters to create a default plaintext document
> > processing pipeline with UMLS lookup
> >
> > // Load a simple token processing pipeline from another pipeline file
> > load DefaultTokenizerPipeline
> >
> > // Add non-core annotators
> > add ContextDependentTokenizerAnnotator
> > addDescription POSTagger
> >
> > // Add Chunkers
> > load ChunkerSubPipe
> >
> > // Default fast dictionary lookup
> > load DictionarySubPipe
> >
> > // Add Cleartk Entity Attribute annotators
> > load AttributeCleartkSubPipe
> >
> >
> ========================================================================================
> >
> >
> > I recommend that you copy those lines to a new file (for instance,
> > Maral.piper) and then add the following lines:
> >
> >
> >
> ========================================================================================
> > // Write marked copy of note text in interactive html files
> > add pretty.html.HtmlTextWriter SubDirectory=HTML
> >
> > // Write Fast Health Interoperability Resources (FHIR) json files.
> > fhir.org
> > package org.apache.ctakes.fhir.cc
> > add FhirJsonFileWriter SubDirectory=FHIR
> >
> > // Write plaintext copy of note text with cui, semantic group, POS.
> > Relations are listsed.
> > add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT
> >
> > // Write plaintext copy of note sentences with entity and relation
> > disveries listed.
> > add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
> >
> >
> ========================================================================================
> >
> >
> > The output directory should then contain some new output in different
> > subdirectories.  You can change the subdirectory names.
> >
> > Note: the "=================================" are just there to indicate
> > what is for the file.  Do not copy them.
> >
> > There are many more file writers, most of which write simple lists of
> > discoveries in one form or another.
> > I recommend trying the 4 above and see if any fit your purposes before
> > moving on to more specialized writers.
> >
> > Sean
> >
> > ________________________________________
> > From: Maral Amir <[email protected]>
> > Sent: Thursday, July 18, 2019 7:11 PM
> > To: [email protected]
> > Subject: Re: cTAKES Pipeline [EXTERNAL]
> >
> > Hi Sean,
> >
> > Thank you so much for your very helpful and comprehensive response. I was
> > able to generate the xmi results in the output directory and used UIMA
> Cas
> > Visual Debugger (CVD) as suggested to view the information. I have two
> > questions:
> > 1. What is the best reference for me to study and understand the
> > annotations.
> > 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
> > readable format without the help of CVD.
> >
> > Thanks,
> > Maral
> >
> >
> > On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
> > [email protected]> wrote:
> >
> > > Hi Maral,
> > >
> > > This might be what you are talking about with respect to the Default
> > > Clinical Pipeline
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
> > >
> > > That lists a command line method for running a set of files and getting
> > > xml output.
> > >
> > > The default clinical pipeline configuration is actually contained in
> the
> > > plain text (piper) file
> > > resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> > >
> > > If you are looking at source code then the file is
> > > ctakes-clinical-pipeline-res/src/main/resources/ ...
> > >
> > > You can also select and run a piper file with a gui
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
> > >
> > > Both methods are mentioned near the bottom of one of the pages
> detailing
> > > pipeline configuration
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
> > >
> > > There are several example pipelines constructed with code and/or plain
> > > text files in the ctakes-examples and ctakes-examples-res modules.  You
> > can
> > > look at the different "Hello World" examples.
> > >
> > > Since you are playing with maven, you can run the profile
> "runPiperGui".
> > > mvn clean compile -DskipTests -PrunPiperGui
> > >
> > > Sean
> > >
> > >
> > > ________________________________________
> > > From: Maral Amir <[email protected]>
> > > Sent: Thursday, July 18, 2019 2:29 PM
> > > To: [email protected]
> > > Subject: cTAKES Pipeline [EXTERNAL]
> > >
> > > Hi,
> > >
> > > I just build my developer version of cTAKES with the help of wonderful
> > > cTAKES developers.
> > >
> > > For my next step, I would appreciate if somebody direct me to a right
> > path.
> > > I am planning to process text clinical documents through the entire
> > > pipeline to generate xml output. I see the website suggest walking
> > through
> > > the Default Clinical Pipeline. I understand there are also multiple git
> > > repositories on developed command line tool based Apache cTAKES.
> > > My final goal is to integrate cTAKES with some Python packages( OCR,
> > etc.)
> > > into one pipeline and have some form of web service at the end. I would
> > > deeply appreciate any suggestions.
> > >
> > > Thanks,
> > > Maral
> > >
> >
>


--
Siamak Barzegar, PhD.
Senior Research Engineer.
Biomedical Text Mining Unit.
Barcelona Supercomputing Centre

Re: cTAKES Pipeline [EXTERNAL]

Reply via email to