RE: Filter CVD output? [EXTERNAL]
Thanks again Sean - I now have some nice html files with annotations popping up when hovering over them. "I would like to, in the future, mark up times, lists, and relations. For now, as long as the purpose is displaying mentions to a non-nlper and possibly even passing system output to people that don't have specialized readers (e.g. cvd), the html writer should be useful for a lot of people." This would be very interesting, and even the ability to mark up user defined annotations / dictionary items would be great. i.e. drug name get picked up, but the units / dose etc would also be really good. All the best, Arron. -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: 17 July 2017 15:01 To: dev@ctakes.apache.org Subject: RE: Filter CVD output? [EXTERNAL] Hi A.S., If you are interested in showing medical terms discovered in text to non-nlpers, you could try adding the html writer to your pipeline. ctakes-core org/apache/ctakes/core/cc/pretty/html/HtmlTextWriter.java It creates an html file that displays the document text marked with green, red, yellow and orange underlines for affirmed, negated, uncertain, uncertain-negated medical terms. These would be the typical anatomical site, sign/symptom, disease/disorder, medication, procedure mentions. Tooltips appear over the text indicating the semantic type. You can click on the mention and marked-up details will be displayed on the right with polarity, semantic type, cui, document text and preferred text. Overlapping terms are also handled by the tooltips and details panel. The document title (usually filename) is a header at the top of the document, and section headers are displayed larger and normalized. They are also clickable. This of course requires a sectionizer in the pipeline. The html file is named after the document name. html files are saved in a location indicated by the parameter "OutputDirectory". I would like to, in the future, mark up times, lists, and relations. For now, as long as the purpose is displaying mentions to a non-nlper and possibly even passing system output to people that don't have specialized readers (e.g. cvd), the html writer should be useful for a lot of people. Sean -Original Message- From: Kean Kaufmann [mailto:k...@recordsone.com] Sent: Monday, July 17, 2017 9:30 AM To: dev@ctakes.apache.org Subject: Re: Filter CVD output? [EXTERNAL] Hi A.S., Does the "Show Selected Annotations" menu item serve your purposes? https://urldefense.proofpoint.com/v2/url?u=https-3A__uima.apache.org_d_uimaj-2Dcurrent_tools.html-23cvd.toolsMenu=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGsjlw53mHYzso=ESEOutOylgrvMI3vkv4UK7zx7eH82UeCEXZQKKqkvhU= On Mon, Jul 17, 2017 at 4:31 AM, Lacey A.S. <a.s.la...@swansea.ac.uk> wrote: > Hi - I spend a lot of time showing doctors the output of cTakes via > what I have parsed during post processing. Problem being there is not > context of where it is in the letter each term has been pulled from, visually > anyway. > > It would be great if I could sit down and run a letter through the CVD > program and filter the output to just medical mentions? > > Sent from > Nine<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.9folders. > com_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlG > ZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGs > jlw53mHYzso=lzGaMHUMam8F2ZpNtTRIilIWHKdm6_2QQD6aU4vQK-E= > > >
Re: Filter CVD output? [EXTERNAL]
It's a directory! Problem solved. Thanks Sean. And I will try out the FileTreeReader in it's place... Sent from Nine<http://www.9folders.com/> From: "Finan, Sean" <sean.fi...@childrens.harvard.edu> Sent: 17 Jul 2017 21:08 To: dev@ctakes.apache.org Subject: RE: Filter CVD output? [EXTERNAL] Hi Arron, The TextReader is a fairly old class - it was written before I joined and I've never used it myself. I don't know why it would claim that it doesn't have access For files I always use the FileTreeReader. If I only want to read one file I just throw a copy into a directory by itself. On that note, is "200 letters" a file or a directory? If it is a directory then that is your answer. TextReader wants a list of individual file names. If it gets a directory name then it doesn't gracefully handle the matter, it just throws an exception and fails. Sean -Original Message- From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk] Sent: Monday, July 17, 2017 3:41 PM To: dev@ctakes.apache.org Subject: RE: Filter CVD output? [EXTERNAL] Thanks Sean - I am finally getting somewhere now. I am able to run the following .piper using runPiperFile.bat // Commands and parameters to create a default plaintext document processing pipeline with UMLS lookup // Text Files Reader // Reads document texts from text files specified in a provided list. # files The text files to be loaded reader org.apache.ctakes.core.cr.TextReader files="C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\projects\200 letters" // Load a simple token processing pipeline from another pipeline file load DefaultTokenizerPipeline.piper // Add non-core annotators add ContextDependentTokenizerAnnotator addDescription POSTagger // Add Chunkers load ChunkerSubPipe.piper // Default fast dictionary lookup add DefaultJCasTermAnnotator // Add Cleartk Entity Attribute annotators load AttributeCleartkSubPipe.piper // Pretty Text Writer // Writes text files with document text and simple markups (POS, Semantic Group, CUI, Negation). # OutputDirectory Directory for all output files. # SubDirectory SubDirectory for files. add org.apache.ctakes.core.cc.pretty.plaintext.PrettyTextWriterFit OutputDirectory="C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\user_pipelines\test_output" However I run into a permissions issue on my own filestore (?!) Loading configuration. Loading feature templates. Loading lexica. Loading model: Loading model: . 17 Jul 2017 20:36:01 ERROR PiperFileRunner - C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\projects\200 letters (Access is denied) I've also tried running the batch file as administrator but still the same. Do you have any ideas? Thanks, Arron. -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: 17 July 2017 17:32 To: dev@ctakes.apache.org Subject: RE: Filter CVD output? [EXTERNAL] Hi Arron, In your version of the clinical pipeline gui you just need to set the value of OutputDirectory: add org.apache.ctakes.core.cc.pretty.html.HtmlTextWriter OutputDirectory=/my/directory In the pipeline creator gui you should be able to click the button with a folder icon to the right of "OutputDirectory" in the central table and use a file browser. Or you can edit the piper manually (far right panel). I am not sure why the piper validates. If OutputDirectory is not set then it is a bug in validation: it should claim that the piper is not valid. It is probably a bug. If you think that the piper is valid then you can save it and then try to run via command line with the bin/runPiperFile script in ctakes-distribution o
RE: Filter CVD output? [EXTERNAL]
Hi Arron, In your version of the clinical pipeline gui you just need to set the value of OutputDirectory: add org.apache.ctakes.core.cc.pretty.html.HtmlTextWriter OutputDirectory=/my/directory In the pipeline creator gui you should be able to click the button with a folder icon to the right of "OutputDirectory" in the central table and use a file browser. Or you can edit the piper manually (far right panel). I am not sure why the piper validates. If OutputDirectory is not set then it is a bug in validation: it should claim that the piper is not valid. It is probably a bug. If you think that the piper is valid then you can save it and then try to run via command line with the bin/runPiperFile script in ctakes-distribution or via the PiperFileRunner class in core. See the near-bottom of https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files More on that later. Are you using the 4.0 release or trunk? I ask for two reasons: - The latest HtmlTextWriter in trunk is much better than that in the 4.0 release - Trunk contains the PiperRunnerGui in org.apache.ctakes.gui.pipeline I advise that you use ctakes trunk. The PiperRunnerGui does two things for you: - It makes setting command-line parameters easy - It allows you to save command-line parameters so that you don't need to hard code things like OutputDirectory into your piper file. Check https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI More on that later. The default clinical pipeline piper actually is a complete end-to-end pipeline. I am to blame for absent documentation. I should probably have more detailed information on the page on the default pipeline itself https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline And maybe piper defaults for all pipers on https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files If no reader is specified but InputDirectory is set, then the FileTreeReader is used by default. If a value is specified for the "--xmiOut" command-line parameter then the FileTreeXmiWriter is used. InputDirectory can be set using -i on the command line. The piper file submitter gui will read your piper file and provide text boxes for all available "cli" options, including those that are custom for the piper file. This always includes the options for the default clinical pipeline even if they aren't necessary. That is to say that --xmiOut will be available to set but you don't need to do so. Ditto for OutputDirectory, Umls user/pass, etc. They are always there for convenience as those are standard options. So, you don't need to set OutputDirectory in your version of the clinical pipeline. Just use the gui and set it on the gui. You can save and reload your option values if you plan to keep using the same values. It is basically a pretty equivalent to what could otherwise be done with the runPiperFile script or PiperFileRunner class. As for example complete pipeline piper files, you can find some in ctakes-examples-res org.apache.ctakes.examples.pipeline: HelloWorld.piper HelloWorldAssertProps.piper HelloWorldCui.piper HelloWorldProps.piper HelloWorldTkProps.piper ProcessDir.piper The HelloWorld pipers have launch classes in ctakes-examples org.apache.ctakes.examples.pipeline that simply provide a string of text for processing. The ProcessDir piper is more independent and uses the readFiles command to process a directory tree of example notes. I hope that covers all of your questions, but let me know if anything is terribly unclear. This is a good indication that I need to improve the documentation. Sean -Original Message- From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk] Sent: Monday, July 17, 2017 11:21 AM To: dev@ctakes.apache.org Subject: RE: Filter CVD output? [EXTERNAL] Hi Sean - thanks for such a quick reply. This sounds interesting and something that would help me convey what has been found to non-nlpers. I do all of my processing just through CVD / CPE using the fastUMLSProcessor. So using the nice pipeline creator GUI I have got this far (by importing the existing /clinical/pipeline/DefaultFastPipeline.piper): // Commands and parameters to create a default plaintext document processing pipeline with UMLS lookup // Load a simple token processing pipeline from another pipeline file # files The text files to be loaded reader org.apache.ctakes.core.cr.TextReader files="C:\Users\arron\Documents\ 200 letters\Epi_Let192.docx" load DefaultTokenizerPipeline.piper // Add non-core annotators add ContextDependentTokenizerAnnotator addDescription POSTagger // Add Chunkers load ChunkerSubPipe.piper // Default fast dictionary lookup add DefaultJCasTermAnnotator // Add Cleartk Entity Attribute annotators load AttributeCleartkSubPipe.piper // HTML Writer // Writes html files with document text and simple markups (Semantic Group, CUI, Negation). # OutputD
RE: Filter CVD output? [EXTERNAL]
Hi A.S., If you are interested in showing medical terms discovered in text to non-nlpers, you could try adding the html writer to your pipeline. ctakes-core org/apache/ctakes/core/cc/pretty/html/HtmlTextWriter.java It creates an html file that displays the document text marked with green, red, yellow and orange underlines for affirmed, negated, uncertain, uncertain-negated medical terms. These would be the typical anatomical site, sign/symptom, disease/disorder, medication, procedure mentions. Tooltips appear over the text indicating the semantic type. You can click on the mention and marked-up details will be displayed on the right with polarity, semantic type, cui, document text and preferred text. Overlapping terms are also handled by the tooltips and details panel. The document title (usually filename) is a header at the top of the document, and section headers are displayed larger and normalized. They are also clickable. This of course requires a sectionizer in the pipeline. The html file is named after the document name. html files are saved in a location indicated by the parameter "OutputDirectory". I would like to, in the future, mark up times, lists, and relations. For now, as long as the purpose is displaying mentions to a non-nlper and possibly even passing system output to people that don't have specialized readers (e.g. cvd), the html writer should be useful for a lot of people. Sean -Original Message- From: Kean Kaufmann [mailto:k...@recordsone.com] Sent: Monday, July 17, 2017 9:30 AM To: dev@ctakes.apache.org Subject: Re: Filter CVD output? [EXTERNAL] Hi A.S., Does the "Show Selected Annotations" menu item serve your purposes? https://urldefense.proofpoint.com/v2/url?u=https-3A__uima.apache.org_d_uimaj-2Dcurrent_tools.html-23cvd.toolsMenu=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGsjlw53mHYzso=ESEOutOylgrvMI3vkv4UK7zx7eH82UeCEXZQKKqkvhU= On Mon, Jul 17, 2017 at 4:31 AM, Lacey A.S. <a.s.la...@swansea.ac.uk> wrote: > Hi - I spend a lot of time showing doctors the output of cTakes via > what I have parsed during post processing. Problem being there is not > context of where it is in the letter each term has been pulled from, visually > anyway. > > It would be great if I could sit down and run a letter through the CVD > program and filter the output to just medical mentions? > > Sent from > Nine<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.9folders. > com_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlG > ZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGs > jlw53mHYzso=lzGaMHUMam8F2ZpNtTRIilIWHKdm6_2QQD6aU4vQK-E= > > >