Re: cTAKES Pipeline [EXTERNAL]

Finan, Sean Fri, 19 Jul 2019 12:09:21 -0700

Hi Maral,

There are two slightly different directory structures.  One for  development 
(source structure), another for end use (installation structure).


Since you have a copy of the source, lets start with that.  

Are you using an IDE (Integrated Development Environment) such as IntelliJ or 
Eclipse? 
If so then you should be able to create a run profile that can run pipelines.  
There is plenty of help online for that kind of thing, and people on the 
mailing list can probably provide examples of what they have used for ctakes.

If you are not using an ide, I suggest using the -PrunPiperGui maven profile 
that I mentioned below - just to run a test pipeline and see your output.  
After you have a successful run then we can move on to other run methods.

When you use an ide run profile or a maven profile you don't need to specify 
$CTAKES_HOME or worry about the classpath or bin/.

Sean

________________________________________
From: Maral Amir <[email protected]>
Sent: Friday, July 19, 2019 2:12 PM
To: [email protected]
Subject: Re: cTAKES Pipeline [EXTERNAL]

Hi Sean,

Thank you so much for your insightful response.
I'm having a problem linking the piper files. I should mention I am using
command line interface. Could you please kindly let me know:

1. What I should set my CTAKES_HOME variable into. Right now I set my
CTAKES_HOME to my cTAKES user installation main folder. That is because I
could see in the last line of the runPiperFile.sh, the class directory
$CTAKES_HOME/lib/* is included and no /lib folder is present in the
developer's version.

java -cp
$CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/*
-Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
org.apache.ctakes.core.pipeline.PiperFileRunner "$@"


Also,

2. Where is the *"bin"* folder where the bash file resides. Right now I use
this one :
/Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin


Thanks,
Maral

On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean <
[email protected]> wrote:

> Hi Maral,
>
> You can generate different output types by adding different writers to the
> end of the pipeline.
> Here are the contents of the Default Clinical Pipeline piper file:
>
>
> ========================================================================================
> // Commands and parameters to create a default plaintext document
> processing pipeline with UMLS lookup
>
> // Load a simple token processing pipeline from another pipeline file
> load DefaultTokenizerPipeline
>
> // Add non-core annotators
> add ContextDependentTokenizerAnnotator
> addDescription POSTagger
>
> // Add Chunkers
> load ChunkerSubPipe
>
> // Default fast dictionary lookup
> load DictionarySubPipe
>
> // Add Cleartk Entity Attribute annotators
> load AttributeCleartkSubPipe
>
> ========================================================================================
>
>
> I recommend that you copy those lines to a new file (for instance,
> Maral.piper) and then add the following lines:
>
>
> ========================================================================================
> // Write marked copy of note text in interactive html files
> add pretty.html.HtmlTextWriter SubDirectory=HTML
>
> // Write Fast Health Interoperability Resources (FHIR) json files.
> fhir.org
> package org.apache.ctakes.fhir.cc
> add FhirJsonFileWriter SubDirectory=FHIR
>
> // Write plaintext copy of note text with cui, semantic group, POS.
> Relations are listsed.
> add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT
>
> // Write plaintext copy of note sentences with entity and relation
> disveries listed.
> add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
>
> ========================================================================================
>
>
> The output directory should then contain some new output in different
> subdirectories.  You can change the subdirectory names.
>
> Note: the "=================================" are just there to indicate
> what is for the file.  Do not copy them.
>
> There are many more file writers, most of which write simple lists of
> discoveries in one form or another.
> I recommend trying the 4 above and see if any fit your purposes before
> moving on to more specialized writers.
>
> Sean
>
> ________________________________________
> From: Maral Amir <[email protected]>
> Sent: Thursday, July 18, 2019 7:11 PM
> To: [email protected]
> Subject: Re: cTAKES Pipeline [EXTERNAL]
>
> Hi Sean,
>
> Thank you so much for your very helpful and comprehensive response. I was
> able to generate the xmi results in the output directory and used UIMA Cas
> Visual Debugger (CVD) as suggested to view the information. I have two
> questions:
> 1. What is the best reference for me to study and understand the
> annotations.
> 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
> readable format without the help of CVD.
>
> Thanks,
> Maral
>
>
> On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
> [email protected]> wrote:
>
> > Hi Maral,
> >
> > This might be what you are talking about with respect to the Default
> > Clinical Pipeline
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
> >
> > That lists a command line method for running a set of files and getting
> > xml output.
> >
> > The default clinical pipeline configuration is actually contained in the
> > plain text (piper) file
> > resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> >
> > If you are looking at source code then the file is
> > ctakes-clinical-pipeline-res/src/main/resources/ ...
> >
> > You can also select and run a piper file with a gui
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
> >
> > Both methods are mentioned near the bottom of one of the pages detailing
> > pipeline configuration
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
> >
> > There are several example pipelines constructed with code and/or plain
> > text files in the ctakes-examples and ctakes-examples-res modules.  You
> can
> > look at the different "Hello World" examples.
> >
> > Since you are playing with maven, you can run the profile "runPiperGui".
> > mvn clean compile -DskipTests -PrunPiperGui
> >
> > Sean
> >
> >
> > ________________________________________
> > From: Maral Amir <[email protected]>
> > Sent: Thursday, July 18, 2019 2:29 PM
> > To: [email protected]
> > Subject: cTAKES Pipeline [EXTERNAL]
> >
> > Hi,
> >
> > I just build my developer version of cTAKES with the help of wonderful
> > cTAKES developers.
> >
> > For my next step, I would appreciate if somebody direct me to a right
> path.
> > I am planning to process text clinical documents through the entire
> > pipeline to generate xml output. I see the website suggest walking
> through
> > the Default Clinical Pipeline. I understand there are also multiple git
> > repositories on developed command line tool based Apache cTAKES.
> > My final goal is to integrate cTAKES with some Python packages( OCR,
> etc.)
> > into one pipeline and have some form of web service at the end. I would
> > deeply appreciate any suggestions.
> >
> > Thanks,
> > Maral
> >
>

Re: cTAKES Pipeline [EXTERNAL]

Reply via email to