Thanks Sean - I am finally getting somewhere now. I am able to run the
following .piper using runPiperFile.bat
// Commands and parameters to create a default plaintext document processing
pipeline with UMLS lookup
// Text Files Reader
// Reads document texts from text files specified in a provided list.
# files The text files to be loaded
reader org.apache.ctakes.core.cr.TextReader
files="C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\projects\200
letters"
// Load a simple token processing pipeline from another pipeline file
load DefaultTokenizerPipeline.piper
// Add non-core annotators
add ContextDependentTokenizerAnnotator
addDescription POSTagger
// Add Chunkers
load ChunkerSubPipe.piper
// Default fast dictionary lookup
add DefaultJCasTermAnnotator
// Add Cleartk Entity Attribute annotators
load AttributeCleartkSubPipe.piper
// Pretty Text Writer
// Writes text files with document text and simple markups (POS, Semantic
Group, CUI, Negation).
# OutputDirectory Directory for all output files.
# SubDirectory SubDirectory for files.
add org.apache.ctakes.core.cc.pretty.plaintext.PrettyTextWriterFit
OutputDirectory="C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\user_pipelines\test_output"
However I run into a permissions issue on my own filestore (?!)
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
................................
Loading model:
.............................
17 Jul 2017 20:36:01 ERROR PiperFileRunner -
C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\projects\200
letters (Access is denied)
I've also tried running the batch file as administrator but still the same. Do
you have any ideas?
Thanks,
Arron.
-----Original Message-----
From: Finan, Sean [mailto:[email protected]]
Sent: 17 July 2017 17:32
To: [email protected]
Subject: RE: Filter CVD output? [EXTERNAL]
Hi Arron,
In your version of the clinical pipeline gui you just need to set the value of
OutputDirectory:
add org.apache.ctakes.core.cc.pretty.html.HtmlTextWriter
OutputDirectory=/my/directory
In the pipeline creator gui you should be able to click the button with a
folder icon to the right of "OutputDirectory" in the central table and use a
file browser. Or you can edit the piper manually (far right panel).
I am not sure why the piper validates. If OutputDirectory is not set then it
is a bug in validation: it should claim that the piper is not valid. It is
probably a bug.
If you think that the piper is valid then you can save it and then try to run
via command line with the bin/runPiperFile script in ctakes-distribution or via
the PiperFileRunner class in core. See the near-bottom of
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
More on that later.
Are you using the 4.0 release or trunk? I ask for two reasons:
- The latest HtmlTextWriter in trunk is much better than that in the 4.0 release
- Trunk contains the PiperRunnerGui in org.apache.ctakes.gui.pipeline
I advise that you use ctakes trunk.
The PiperRunnerGui does two things for you:
- It makes setting command-line parameters easy
- It allows you to save command-line parameters so that you don't need to hard
code things like OutputDirectory into your piper file.
Check
https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI
More on that later.
The default clinical pipeline piper actually is a complete end-to-end pipeline.
I am to blame for absent documentation. I should probably have more detailed
information on the page on the default pipeline itself
https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline
And maybe piper defaults for all pipers on
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
If no reader is specified but InputDirectory is set, then the FileTreeReader is
used by default.
If a value is specified for the "--xmiOut" command-line parameter then the
FileTreeXmiWriter is used. InputDirectory can be set using -i on the command
line.
The piper file submitter gui will read your piper file and provide text boxes
for all available "cli" options, including those that are custom for the piper
file. This always includes the options for the default clinical pipeline even
if they aren't necessary. That is to say that --xmiOut will be available to
set but you don't need to do so. Ditto for OutputDirectory, Umls user/pass,
etc. They are always there for convenience as those are standard options. So,
you don't need to set OutputDirectory in your version of the clinical pipeline.
Just use the gui and set it on the gui. You can save and reload your option
values if you plan to keep using the same values. It is basically a pretty
equivalent to what could otherwise be done with the runPiperFile script or
PiperFileRunner class.
As for example complete pipeline piper files, you can find some in
ctakes-examples-res org.apache.ctakes.examples.pipeline:
HelloWorld.piper
HelloWorldAssertProps.piper
HelloWorldCui.piper
HelloWorldProps.piper
HelloWorldTkProps.piper
ProcessDir.piper
The HelloWorld pipers have launch classes in ctakes-examples
org.apache.ctakes.examples.pipeline that simply provide a string of text for
processing.
The ProcessDir piper is more independent and uses the readFiles command to
process a directory tree of example notes.
I hope that covers all of your questions, but let me know if anything is
terribly unclear. This is a good indication that I need to improve the
documentation.
Sean
-----Original Message-----
From: Lacey A.S. [mailto:[email protected]]
Sent: Monday, July 17, 2017 11:21 AM
To: [email protected]
Subject: RE: Filter CVD output? [EXTERNAL]
Hi Sean - thanks for such a quick reply.
This sounds interesting and something that would help me convey what has been
found to non-nlpers. I do all of my processing just through CVD / CPE using the
fastUMLSProcessor. So using the nice pipeline creator GUI I have got this far
(by importing the existing /clinical/pipeline/DefaultFastPipeline.piper):
// Commands and parameters to create a default plaintext document processing
pipeline with UMLS lookup
// Load a simple token processing pipeline from another pipeline file
# files The text files to be loaded
reader org.apache.ctakes.core.cr.TextReader files="C:\Users\arron\Documents\
200 letters\Epi_Let192.docx"
load DefaultTokenizerPipeline.piper
// Add non-core annotators
add ContextDependentTokenizerAnnotator
addDescription POSTagger
// Add Chunkers
load ChunkerSubPipe.piper
// Default fast dictionary lookup
add DefaultJCasTermAnnotator
// Add Cleartk Entity Attribute annotators load AttributeCleartkSubPipe.piper
// HTML Writer
// Writes html files with document text and simple markups (Semantic Group,
CUI, Negation).
# OutputDirectory Directory for all output files.
add org.apache.ctakes.core.cc.pretty.html.HtmlTextWriter OutputDirectory
Validates fine (yellow button in the pipeline creator), but the option to
actually run it (green button) not available yet. I'm guessing I'm missing some
pipe bits?
In fact, does anyone have an example "start to finish" .piper file?
Arron
-----Original Message-----
From: Finan, Sean [mailto:[email protected]]
Sent: 17 July 2017 15:01
To: [email protected]
Subject: RE: Filter CVD output? [EXTERNAL]
Hi A.S.,
If you are interested in showing medical terms discovered in text to
non-nlpers, you could try adding the html writer to your pipeline.
ctakes-core org/apache/ctakes/core/cc/pretty/html/HtmlTextWriter.java
It creates an html file that displays the document text marked with green, red,
yellow and orange underlines for affirmed, negated, uncertain,
uncertain-negated medical terms. These would be the typical anatomical site,
sign/symptom, disease/disorder, medication, procedure mentions. Tooltips
appear over the text indicating the semantic type. You can click on the
mention and marked-up details will be displayed on the right with polarity,
semantic type, cui, document text and preferred text. Overlapping terms are
also handled by the tooltips and details panel.
The document title (usually filename) is a header at the top of the document,
and section headers are displayed larger and normalized. They are also
clickable. This of course requires a sectionizer in the pipeline. The html
file is named after the document name. html files are saved in a location
indicated by the parameter "OutputDirectory".
I would like to, in the future, mark up times, lists, and relations. For now,
as long as the purpose is displaying mentions to a non-nlper and possibly even
passing system output to people that don't have specialized readers (e.g. cvd),
the html writer should be useful for a lot of people.
Sean
-----Original Message-----
From: Kean Kaufmann [mailto:[email protected]]
Sent: Monday, July 17, 2017 9:30 AM
To: [email protected]
Subject: Re: Filter CVD output? [EXTERNAL]
Hi A.S.,
Does the "Show Selected Annotations" menu item serve your purposes?
https://urldefense.proofpoint.com/v2/url?u=https-3A__uima.apache.org_d_uimaj-2Dcurrent_tools.html-23cvd.toolsMenu&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGsjlw53mHYzso&s=ESEOutOylgrvMI3vkv4UK7zx7eH82UeCEXZQKKqkvhU&e=
On Mon, Jul 17, 2017 at 4:31 AM, Lacey A.S. <[email protected]> wrote:
> Hi - I spend a lot of time showing doctors the output of cTakes via
> what I have parsed during post processing. Problem being there is not
> context of where it is in the letter each term has been pulled from, visually
> anyway.
>
> It would be great if I could sit down and run a letter through the CVD
> program and filter the output to just medical mentions?
>
> Sent from
> Nine<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.9folders.
> com_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlG
> ZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGs
> jlw53mHYzso&s=lzGaMHUMam8F2ZpNtTRIilIWHKdm6_2QQD6aU4vQK-E&e= >
>
>