Hi Sean, Thank you for your reply. Your response to Siamak was very helpful and it answered many of my questions. I appreciate your kind help.
Best, Maral On Mon, Jul 22, 2019 at 7:25 AM Finan, Sean < [email protected]> wrote: > Hi Siamak, > > Just to clarify, $CTAKES_HOME isn't required by ctakes itself. It is in > the bin/ scripts just to make the java command at the end of the script > more concise. > The script attempts to set it to the directory in which you installed a > binary installation of ctakes. > The $CTAKES_HOME directory should contain the ctakes directories bin/ lib/ > resources/ ... > > Also, please be aware that those bin/ scripts are not meant to be used > with a developer installation. > The scripts are meant to be used with a built binary installation of > ctakes - one with bin/ lib/ resources/ ... subdirectories all in one > [ctakes home, ctakes root] directory. > A developer [root] directory has subdirectories ctakes-core/ > ctakes-core-res/ ctakes-clinical-pipeline/ ctakes-clinical-pipeline-res/ ... > > Can I assume, since you are writing java code (using PipelineBuilder), > that you have some experience with java? > You can put PipelineBuilder in any main(..) method and then start that > main(..) from a command line just as you would any other java program. > Just like any other java program, you need to have your $CLASSPATH set > correctly and, for memory use, increase your maximum memory with -Xmx . > These are VM options. > > The same goes for running the org.apache.uima.tools.cvd.CVD . It is just > another regular old java class with a main(..) method. > > To use the CVD to inspect output you just need to make sure that you > produce XMI files. With the PipelineBuilder this is really easy. > After adding the AEs to the pipeline, just put .writeXmis( > myOutputDirectory ) at the end of your builder. > > new PipelineBuilder() > .readFiles( myInDir ) > .add( SimpleSegmentAnnotator.class ) for instance > .add( etc. ) > .writeXmis( myOutDir ) > .run(); > > Or some variation of that. See the ctakes-examples module for more, for > instance org.apache.ctakes.examples.pipeline HelloWorldBuilderRunner > > -- If you only need to do anything special with the actual pipeline (e.g. > special ae interfaces, uima control), after it has been built, then I > recommend that you use the PiperFileReader and a piper file. A piper file > helps make the pipeline more transparent (outside the code), and allows > modification of the pipeline without recompiling (and reinstalling). > Then you can get the PipelineBuilder from the PiperFileReader. > See org.apache.ctakes.examples.pipeline HelloWorldPiperRunner > > PiperFileReader reader = new PiperFileReader( myPiperFilePath ); > PipelineBuilder builder = reader.getBuilder(); > > --- If you don't need to do anything special with the pipeline then I > recommend that you just use the PiperFileRunner, which does everything for > you. > > java [VM options] org.apache.ctakes.core.pipeline.PiperFileRunner -p > myPiperFilePath -i myInDir --writeXmis myOutDir > > One final note: CVD is not a ctakes tool and ctakes itself does not > contain CVD code. The CVD is an Apache UIMA product, and help for can be > found online. > https://uima.apache.org/d/uimaj-current/tools.html > > runctakesCVD is a convenience script, and its name may be a little > misleading. Something like runUimaCvd might cause less confusion; let us > know your thoughts on changing the name. For instance, would a different > name make looking for help faster or easier? > > Sean > > ________________________________________ > From: Siamak Barzegar <[email protected]> > Sent: Monday, July 22, 2019 4:38 AM > To: [email protected] > Subject: Re: cTAKES Pipeline [EXTERNAL] > > Hi Sean, > > I have the same question (I want to run runctakesCVD or ..CPE on my > modified codes, descriptors - not using an IDE, so what I should set my > CTAKES_HOME variable into?). I do not use the Piper Gui. I am > using PipelineBuilder on the source code. > It is important to see the results by runctakesCVD as well. But what I > should set my CTAKES_HOME variable into in runctakesCVD.sh file? > java -cp $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/lib/* > -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g > org.apache.uima.tools.cvd.CVD "$@" > > With Best Wishes, > Siamak > > On Fri, 19 Jul 2019 at 20:12, Maral Amir <[email protected]> wrote: > > > Hi Sean, > > > > Thank you so much for your insightful response. > > I'm having a problem linking the piper files. I should mention I am using > > command line interface. Could you please kindly let me know: > > > > 1. What I should set my CTAKES_HOME variable into. Right now I set my > > CTAKES_HOME to my cTAKES user installation main folder. That is because I > > could see in the last line of the runPiperFile.sh, the class directory > > $CTAKES_HOME/lib/* is included and no /lib folder is present in the > > developer's version. > > > > java -cp > > > > > $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/* > > -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g > > org.apache.ctakes.core.pipeline.PiperFileRunner "$@" > > > > > > Also, > > > > 2. Where is the *"bin"* folder where the bash file resides. Right now I > use > > this one : > > /Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin > > > > > > Thanks, > > Maral > > > > On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean < > > [email protected]> wrote: > > > > > Hi Maral, > > > > > > You can generate different output types by adding different writers to > > the > > > end of the pipeline. > > > Here are the contents of the Default Clinical Pipeline piper file: > > > > > > > > > > > > ======================================================================================== > > > // Commands and parameters to create a default plaintext document > > > processing pipeline with UMLS lookup > > > > > > // Load a simple token processing pipeline from another pipeline file > > > load DefaultTokenizerPipeline > > > > > > // Add non-core annotators > > > add ContextDependentTokenizerAnnotator > > > addDescription POSTagger > > > > > > // Add Chunkers > > > load ChunkerSubPipe > > > > > > // Default fast dictionary lookup > > > load DictionarySubPipe > > > > > > // Add Cleartk Entity Attribute annotators > > > load AttributeCleartkSubPipe > > > > > > > > > ======================================================================================== > > > > > > > > > I recommend that you copy those lines to a new file (for instance, > > > Maral.piper) and then add the following lines: > > > > > > > > > > > > ======================================================================================== > > > // Write marked copy of note text in interactive html files > > > add pretty.html.HtmlTextWriter SubDirectory=HTML > > > > > > // Write Fast Health Interoperability Resources (FHIR) json files. > > > fhir.org > > > package org.apache.ctakes.fhir.cc > > > add FhirJsonFileWriter SubDirectory=FHIR > > > > > > // Write plaintext copy of note text with cui, semantic group, POS. > > > Relations are listsed. > > > add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT > > > > > > // Write plaintext copy of note sentences with entity and relation > > > disveries listed. > > > add property.plaintext.PropertyTextWriterFit SubDirectory=PROP > > > > > > > > > ======================================================================================== > > > > > > > > > The output directory should then contain some new output in different > > > subdirectories. You can change the subdirectory names. > > > > > > Note: the "=================================" are just there to > indicate > > > what is for the file. Do not copy them. > > > > > > There are many more file writers, most of which write simple lists of > > > discoveries in one form or another. > > > I recommend trying the 4 above and see if any fit your purposes before > > > moving on to more specialized writers. > > > > > > Sean > > > > > > ________________________________________ > > > From: Maral Amir <[email protected]> > > > Sent: Thursday, July 18, 2019 7:11 PM > > > To: [email protected] > > > Subject: Re: cTAKES Pipeline [EXTERNAL] > > > > > > Hi Sean, > > > > > > Thank you so much for your very helpful and comprehensive response. I > was > > > able to generate the xmi results in the output directory and used UIMA > > Cas > > > Visual Debugger (CVD) as suggested to view the information. I have two > > > questions: > > > 1. What is the best reference for me to study and understand the > > > annotations. > > > 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a > > > readable format without the help of CVD. > > > > > > Thanks, > > > Maral > > > > > > > > > On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean < > > > [email protected]> wrote: > > > > > > > Hi Maral, > > > > > > > > This might be what you are talking about with respect to the Default > > > > Clinical Pipeline > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e= > > > > > > > > That lists a command line method for running a set of files and > getting > > > > xml output. > > > > > > > > The default clinical pipeline configuration is actually contained in > > the > > > > plain text (piper) file > > > > > resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper > > > > > > > > If you are looking at source code then the file is > > > > ctakes-clinical-pipeline-res/src/main/resources/ ... > > > > > > > > You can also select and run a piper file with a gui > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e= > > > > > > > > Both methods are mentioned near the bottom of one of the pages > > detailing > > > > pipeline configuration > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e= > > > > > > > > There are several example pipelines constructed with code and/or > plain > > > > text files in the ctakes-examples and ctakes-examples-res modules. > You > > > can > > > > look at the different "Hello World" examples. > > > > > > > > Since you are playing with maven, you can run the profile > > "runPiperGui". > > > > mvn clean compile -DskipTests -PrunPiperGui > > > > > > > > Sean > > > > > > > > > > > > ________________________________________ > > > > From: Maral Amir <[email protected]> > > > > Sent: Thursday, July 18, 2019 2:29 PM > > > > To: [email protected] > > > > Subject: cTAKES Pipeline [EXTERNAL] > > > > > > > > Hi, > > > > > > > > I just build my developer version of cTAKES with the help of > wonderful > > > > cTAKES developers. > > > > > > > > For my next step, I would appreciate if somebody direct me to a right > > > path. > > > > I am planning to process text clinical documents through the entire > > > > pipeline to generate xml output. I see the website suggest walking > > > through > > > > the Default Clinical Pipeline. I understand there are also multiple > git > > > > repositories on developed command line tool based Apache cTAKES. > > > > My final goal is to integrate cTAKES with some Python packages( OCR, > > > etc.) > > > > into one pipeline and have some form of web service at the end. I > would > > > > deeply appreciate any suggestions. > > > > > > > > Thanks, > > > > Maral > > > > > > > > > > > > -- > Siamak Barzegar, PhD. > Senior Research Engineer. > Biomedical Text Mining Unit. > Barcelona Supercomputing Centre >
