Re: cTAKES Pipeline [EXTERNAL]

Maral Amir Mon, 22 Jul 2019 08:39:16 -0700

Hi Sean,

Thank you for your reply. Your response to Siamak was very helpful and it
answered many of my questions. I appreciate your kind help.


Best,
Maral

On Mon, Jul 22, 2019 at 7:25 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Siamak,
>
> Just to clarify, $CTAKES_HOME isn't required by ctakes itself.  It is in
> the bin/ scripts just to make the java command at the end of the script
> more concise.
> The script attempts to set it to the directory in which you installed a
> binary installation of ctakes.
> The $CTAKES_HOME directory should contain the ctakes directories bin/ lib/
> resources/ ...
>
> Also, please be aware that those bin/ scripts are not meant to be used
> with a developer installation.
> The scripts are meant to be used with a built binary installation of
> ctakes - one with bin/ lib/ resources/ ... subdirectories all in one
> [ctakes home, ctakes root] directory.
> A developer [root] directory has subdirectories  ctakes-core/
> ctakes-core-res/ ctakes-clinical-pipeline/ ctakes-clinical-pipeline-res/ ...
>
> Can I assume, since you are writing java code (using PipelineBuilder),
> that you have some experience with java?
> You can put PipelineBuilder in any main(..) method and then start that
> main(..) from a command line just as you would any other java program.
> Just like any other java program, you need to have your $CLASSPATH set
> correctly and, for memory use, increase your maximum memory with -Xmx .
> These are VM options.
>
> The same goes for running the org.apache.uima.tools.cvd.CVD .  It is just
> another regular old java class with a main(..) method.
>
> To use the CVD to inspect output you just need to make sure that you
> produce XMI files.  With the PipelineBuilder this is really easy.
> After adding the AEs to the pipeline, just put .writeXmis(
> myOutputDirectory ) at the end of your builder.
>
> new PipelineBuilder()
> .readFiles( myInDir )
> .add( SimpleSegmentAnnotator.class )     for instance
> .add( etc. )
> .writeXmis( myOutDir )
> .run();
>
> Or some variation of that.  See the ctakes-examples module for more, for
> instance org.apache.ctakes.examples.pipeline    HelloWorldBuilderRunner
>
> -- If you only need to do anything special with the actual pipeline (e.g.
> special ae interfaces, uima control), after it has been built, then I
> recommend that you use the PiperFileReader and a piper file.  A piper file
> helps make the pipeline more transparent (outside the code), and allows
> modification of the pipeline without recompiling (and reinstalling).
> Then you can get the PipelineBuilder from the PiperFileReader.
> See org.apache.ctakes.examples.pipeline    HelloWorldPiperRunner
>
> PiperFileReader reader = new PiperFileReader( myPiperFilePath );
> PipelineBuilder builder = reader.getBuilder();
>
> --- If you don't need to do anything special with the pipeline then I
> recommend that you just use the PiperFileRunner, which does everything for
> you.
>
> java [VM options] org.apache.ctakes.core.pipeline.PiperFileRunner -p
> myPiperFilePath -i myInDir --writeXmis myOutDir
>
> One final note:  CVD is not a ctakes tool and ctakes itself does not
> contain CVD code.  The CVD is an Apache UIMA product, and help for can be
> found online.
> https://uima.apache.org/d/uimaj-current/tools.html
>
> runctakesCVD is a convenience script, and its name may be a little
> misleading.  Something like runUimaCvd might cause less confusion; let us
> know your thoughts on changing the name.  For instance, would a different
> name make looking for help faster or easier?
>
> Sean
>
> ________________________________________
> From: Siamak Barzegar <barzegar.sia...@gmail.com>
> Sent: Monday, July 22, 2019 4:38 AM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES Pipeline [EXTERNAL]
>
> Hi Sean,
>
> I have the same question (I want to run runctakesCVD or ..CPE on my
> modified codes, descriptors - not using an IDE, so what I should set my
> CTAKES_HOME variable into?). I do not use the Piper Gui. I am
> using PipelineBuilder on the source code.
> It is important to see the results by runctakesCVD as well. But what I
> should set my CTAKES_HOME variable into in runctakesCVD.sh file?
> java -cp $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/lib/*
> -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
> org.apache.uima.tools.cvd.CVD "$@"
>
> With Best Wishes,
> Siamak
>
> On Fri, 19 Jul 2019 at 20:12, Maral Amir <maraljav...@gmail.com> wrote:
>
> > Hi Sean,
> >
> > Thank you so much for your insightful response.
> > I'm having a problem linking the piper files. I should mention I am using
> > command line interface. Could you please kindly let me know:
> >
> > 1. What I should set my CTAKES_HOME variable into. Right now I set my
> > CTAKES_HOME to my cTAKES user installation main folder. That is because I
> > could see in the last line of the runPiperFile.sh, the class directory
> > $CTAKES_HOME/lib/* is included and no /lib folder is present in the
> > developer's version.
> >
> > java -cp
> >
> >
> $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/*
> > -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
> > org.apache.ctakes.core.pipeline.PiperFileRunner "$@"
> >
> >
> > Also,
> >
> > 2. Where is the *"bin"* folder where the bash file resides. Right now I
> use
> > this one :
> > /Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin
> >
> >
> > Thanks,
> > Maral
> >
> > On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> >
> > > Hi Maral,
> > >
> > > You can generate different output types by adding different writers to
> > the
> > > end of the pipeline.
> > > Here are the contents of the Default Clinical Pipeline piper file:
> > >
> > >
> > >
> >
> ========================================================================================
> > > // Commands and parameters to create a default plaintext document
> > > processing pipeline with UMLS lookup
> > >
> > > // Load a simple token processing pipeline from another pipeline file
> > > load DefaultTokenizerPipeline
> > >
> > > // Add non-core annotators
> > > add ContextDependentTokenizerAnnotator
> > > addDescription POSTagger
> > >
> > > // Add Chunkers
> > > load ChunkerSubPipe
> > >
> > > // Default fast dictionary lookup
> > > load DictionarySubPipe
> > >
> > > // Add Cleartk Entity Attribute annotators
> > > load AttributeCleartkSubPipe
> > >
> > >
> >
> ========================================================================================
> > >
> > >
> > > I recommend that you copy those lines to a new file (for instance,
> > > Maral.piper) and then add the following lines:
> > >
> > >
> > >
> >
> ========================================================================================
> > > // Write marked copy of note text in interactive html files
> > > add pretty.html.HtmlTextWriter SubDirectory=HTML
> > >
> > > // Write Fast Health Interoperability Resources (FHIR) json files.
> > > fhir.org
> > > package org.apache.ctakes.fhir.cc
> > > add FhirJsonFileWriter SubDirectory=FHIR
> > >
> > > // Write plaintext copy of note text with cui, semantic group, POS.
> > > Relations are listsed.
> > > add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT
> > >
> > > // Write plaintext copy of note sentences with entity and relation
> > > disveries listed.
> > > add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
> > >
> > >
> >
> ========================================================================================
> > >
> > >
> > > The output directory should then contain some new output in different
> > > subdirectories.  You can change the subdirectory names.
> > >
> > > Note: the "=================================" are just there to
> indicate
> > > what is for the file.  Do not copy them.
> > >
> > > There are many more file writers, most of which write simple lists of
> > > discoveries in one form or another.
> > > I recommend trying the 4 above and see if any fit your purposes before
> > > moving on to more specialized writers.
> > >
> > > Sean
> > >
> > > ________________________________________
> > > From: Maral Amir <maraljav...@gmail.com>
> > > Sent: Thursday, July 18, 2019 7:11 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: cTAKES Pipeline [EXTERNAL]
> > >
> > > Hi Sean,
> > >
> > > Thank you so much for your very helpful and comprehensive response. I
> was
> > > able to generate the xmi results in the output directory and used UIMA
> > Cas
> > > Visual Debugger (CVD) as suggested to view the information. I have two
> > > questions:
> > > 1. What is the best reference for me to study and understand the
> > > annotations.
> > > 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
> > > readable format without the help of CVD.
> > >
> > > Thanks,
> > > Maral
> > >
> > >
> > > On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
> > > sean.fi...@childrens.harvard.edu> wrote:
> > >
> > > > Hi Maral,
> > > >
> > > > This might be what you are talking about with respect to the Default
> > > > Clinical Pipeline
> > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
> > > >
> > > > That lists a command line method for running a set of files and
> getting
> > > > xml output.
> > > >
> > > > The default clinical pipeline configuration is actually contained in
> > the
> > > > plain text (piper) file
> > > >
> resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> > > >
> > > > If you are looking at source code then the file is
> > > > ctakes-clinical-pipeline-res/src/main/resources/ ...
> > > >
> > > > You can also select and run a piper file with a gui
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
> > > >
> > > > Both methods are mentioned near the bottom of one of the pages
> > detailing
> > > > pipeline configuration
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
> > > >
> > > > There are several example pipelines constructed with code and/or
> plain
> > > > text files in the ctakes-examples and ctakes-examples-res modules.
> You
> > > can
> > > > look at the different "Hello World" examples.
> > > >
> > > > Since you are playing with maven, you can run the profile
> > "runPiperGui".
> > > > mvn clean compile -DskipTests -PrunPiperGui
> > > >
> > > > Sean
> > > >
> > > >
> > > > ________________________________________
> > > > From: Maral Amir <maraljav...@gmail.com>
> > > > Sent: Thursday, July 18, 2019 2:29 PM
> > > > To: dev@ctakes.apache.org
> > > > Subject: cTAKES Pipeline [EXTERNAL]
> > > >
> > > > Hi,
> > > >
> > > > I just build my developer version of cTAKES with the help of
> wonderful
> > > > cTAKES developers.
> > > >
> > > > For my next step, I would appreciate if somebody direct me to a right
> > > path.
> > > > I am planning to process text clinical documents through the entire
> > > > pipeline to generate xml output. I see the website suggest walking
> > > through
> > > > the Default Clinical Pipeline. I understand there are also multiple
> git
> > > > repositories on developed command line tool based Apache cTAKES.
> > > > My final goal is to integrate cTAKES with some Python packages( OCR,
> > > etc.)
> > > > into one pipeline and have some form of web service at the end. I
> would
> > > > deeply appreciate any suggestions.
> > > >
> > > > Thanks,
> > > > Maral
> > > >
> > >
> >
>
>
> --
> Siamak Barzegar, PhD.
> Senior Research Engineer.
> Biomedical Text Mining Unit.
> Barcelona Supercomputing Centre
>

Re: cTAKES Pipeline [EXTERNAL]

Reply via email to