Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

Ryan Young Mon, 30 Mar 2020 18:44:58 -0700

Hello Sean,

I have run into some difficulty actually running the script you wrote
(SentenceFirstCuiWriter.java). I spent the last week doing the following:
1.) Installed cTAKES developer version using Eclipse IDE
2.) Added the appropriate import statements at the beginning of
SentenceFirstCuiWriter.java
3.) Placed SentenceFirstCuiWriter.java in this directory:
C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\cc
4.) Successfully built and compiled cTAKES developer version
5.) Successfully run the test configurations which were already in cTAKES
in Eclipse (Run --> Run As --> Maven test)


My main question is how do I run the cTAKES developer version from command
line without running Eclipse or Maven?

I found a post you made last year (
http://mail-archives.apache.org/mod_mbox/ctakes-dev/201907.mbox/%3C1563805239741.31947%40childrens.harvard.edu%3E).
You stated, *"You can put PipelineBuilder in any main(..) method and then
start that main(..) from a command line just as you would any other java
program.  Just like any other java program, you need to have your
$CLASSPATH set correctly and, for memory use, increase your maximum memory
with -Xmx .  These are VM options."*

I think this is what I have to do. But, I am unsure of how to accomplish
this exactly. What I have tried already is:
1.) Launch Command Prompt
2.) Change directory to where PipelineBuilder.java is located
cd
C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\pipeline\PipelineBuilder.java
3.) Enter the following into Command Prompt
java org.apache.ctakes.core.pipeline.PiperFileRunner -p
C:\Users\Ryan\SkyDrive\Desktop\Piper_File.piper -i
C:\Users\Ryan\SkyDrive\Desktop\Input_Folder --writeXmis
C:\Users\Ryan\SkyDrive\Desktop\Output_Folder

I receive the following error in Command Prompt:
Error: Could not find or load main class
org.apache.ctakes.core.pipeline.PiperFileRunner

I am probably missing something. Just not sure what exactly. I'm not too
familiar with Java. The documentation I have been reading hasn't been as
helpful since cTAKES is a much more complex project than the simple
examples they provide.

Lastly, I am using Windows 10.

Thank You,

Ryan Young
MD/MBA Candidate
Jacobs School of Medicine & Biomedical Sciences

On Mon, Mar 23, 2020 at 3:28 PM Ryan Young <[email protected]> wrote:

> Hello Sean,
>
> Wow! This was a lot more than I was anticipating! Thank you very much!
>
> To answer your questions...
> • I am using Windows 10
> • I have the Python script call a shell command to run a batch file. The
> batch file just contains the following line:
> "C:\cTAKES_4.0.0\bin\runPiperFile.bat"  -p "C:\path\to\piper.piper"
> • The Python script waits for the shell command to complete (i.e., when
> cTAKES is finished processing)
> • The Python script will then parse the output text files and then
> continue on with the code
>
> Prior to calling cTAKES, the surgery list is in a Pandas dataframe. The
> workaround I had created was to save each line of the surgery list column
> in the dataframe to a different text file to make it easier for when I had
> to parse the output cTAKES text file. As I had mentioned previously, I
> would like to have just 1 input text file and 1 output text file (as long
> as the output file can be easily parsed by Python).
>
> Regarding my coding background, I don't have much background in Java.
> However, a few years ago, I had no knowledge of Python either, but I was
> able to teach myself while in medical school.
>
> A few more questions for you...
> 1.) Should I save the code you posted in the following location as a .jar
> file?
> C:\cTAKES_4.0.0\lib\SentenceFirstCuiWriter.jar
>
> 2.) Should I replace "add CuiLookupLister" with "add
> SentenceFirstCuiWriter" in the piper file or do I need both?
>
> 3.) If the SentenceFirstCuiWriter is unable to find a valid CUI, will it
> leave a blank, N/A, or NaN value? Having any of these values would
> definitely help when I have Python parse the output text file. When I have
> Python read the output text file, I would have it delete any dataframe rows
> with NaN or N/A in the CUI column.
>
> Thank you very much for your assistance!
>
> Ryan Young
> MD/MBA Candidate
> Jacobs School of Medicine & Biomedical Sciences
>
> On Mon, Mar 23, 2020 at 1:01 PM Finan, Sean <
> [email protected]> wrote:
>
>> Hi Ryan,
>>
>> Here is some code for a writer that will do what you want.
>> To use it, get rid of those first two lines in the piper that I sent
>> (set, reader).
>> The default reader will work just fine, and it will allow you to process
>> multiple surgery lists in on run.
>>
>> Then just add SentenceFirstCuiWriter to the end of your piper.
>>
>> Sean
>>
>>
>> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
>>
>>    public void writeFile( final JCas jCas, final String outputDir,
>>                           final String documentId, final String fileName
>> ) throws IOException {
>>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
>>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
>>             = JCasUtil.indexCovered( jCas, Sentence.class,
>> ProcedureMention.class );
>>       List<Collection<ProcedureMention>> sortedSentenceProcedures
>>             = sentenceMap.entrySet()
>>                          .stream()
>>                          .sorted( Map.Entry.comparingByKey(
>> DefaultAspanComparator.INSTANCE ) )
>>                          .map( Map.Entry::getValue )
>>                          .collect( Collectors.toList() );
>>       try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile )
>> ) ) {
>>          for ( Collection<ProcedureMention> procedures :
>> sortedSentenceProcedures ) {
>>             ProcedureMention firstProcedure
>>                   = procedures.stream()
>>                               .min( Comparator.comparingInt(
>> ProcedureMention::getBegin ) )
>>                               .orElse( null );
>>             if ( firstProcedure != null ) {
>>                String cui
>>                      = OntologyConceptUtil.getCuis( firstProcedure )
>>                                           .stream()
>>                                           .findFirst()
>>                                           .orElse( "" );
>>                if ( !cui.isEmpty() ) {
>>                   writer.write( cui + "\n" );
>>                }
>>             }
>>          }
>>       }
>>    }
>> }
>>
>> ________________________________________
>> From: Ryan Young <[email protected]>
>> Sent: Monday, March 23, 2020 11:02 AM
>> To: [email protected]
>> Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code
>> (CUI) [EXTERNAL]
>>
>> * External Email - Caution *
>>
>>
>> Hello,
>>
>> I am a medical student who happened to come across cTAKES for a project I
>> am working on. What I would like to do is take a list of surgeries in a
>> text file and have cTAKES output what it determines to be the best UMLS
>> code (CUI) for that particular line.
>>
>> Each line of the text file is independent of the others (i.e., each line
>> should be read and analyzed separately). For example, here's my list of
>> the
>> surgeries (Surgery_List.txt):
>> Colonoscopy with Polypectomy
>> Esophagogastroduodenoscopy Colonoscopy
>> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
>> aspiration
>>
>> When I run the piper file (see below), I get the following output:
>> Colonoscopy with Polypectomy
>> "Colonoscopy"
>>   Procedure
>>   C0009378 colonoscopy
>> "Polypectomy"
>>   Procedure
>>   C0521210 Resection of polyp
>>
>> Esophagogastroduodenoscopy Colonoscopy
>> "Esophagogastroduodenoscopy"
>>   Procedure
>>   C0079304 Esophagogastroduodenoscopy
>> "Colonoscopy"
>>   Procedure
>>   C0009378 colonoscopy
>>
>> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
>> aspiration
>> "Esophagogastroduodenoscopy"
>>   Procedure
>>   C0079304 Esophagogastroduodenoscopy
>> "Endoscopic ultrasound"
>>   Procedure
>>   C0376443 Endoscopic Ultrasound
>> "Endoscopic"
>>   Procedure
>>   C0014245 Endoscopy (procedure)
>> "ultrasound"
>>   Procedure
>>   C0041618 Ultrasonography
>> "Fine needle aspiration"
>>   Procedure
>>   C1510483 Fine needle aspiration biopsy
>> "aspiration"
>>   Procedure
>>   C0349707 Aspiration-action
>>
>> Here's the piper file I have been using:
>> reader org.apache.ctakes.core.cr.FileTreeReader
>> InputDirectory="C:\path\to\input\folder"
>> load DefaultTokenizerPipeline.piper
>>
>> SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
>> add ContextDependentTokenizerAnnotator
>> add org.apache.ctakes.necontexts.ContextAnnotator
>> addDescription POSTagger
>> load ChunkerSubPipe.piper
>> set ctakes.umlsuser=my_username ctakes.umlspw=my_password
>> add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
>>
>> DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
>>
>> LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
>> add property.plaintext.PropertyTextWriterFit
>> OutputDirectory="C:\path\to\output\folder"
>>
>> The workaround I have developed is as follows...
>> 1.) Save each line of Surgery_List.txt to separate text files
>> 2.) Use a Python script to parse each individual text file to extract the
>> first UMLS code (CUI) given in the text file
>>
>> The above method works fine when there's only 10 lines, but not so well
>> when there's 40,000 lines in Surgery_List.txt.
>>
>> Ideally, I would like for Fast Dictionary Lookup to just return the top
>> result for each line of Surgery_List.txt. For example, Output.txt would
>> look just like this:
>> C0009378
>> C0079304
>> C0079304
>>
>> Just for reference here's how UMLS codes correspond between
>> Surgery_List.txt and Output.txt:
>> C0009378 --> Colonoscopy with Polypectomy
>> C0079304 --> Esophagogastroduodenoscopy Colonoscopy
>> C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
>> needle aspiration
>>
>> Is there something I can add to the piper file to make this happen?
>>
>> Currently, I have the cTAKES user version installed, but I could install
>> the developer version if need be. I would just need a little guidance on
>> which Java script I would need to modify to get the desired results.
>>
>> Thank You,
>>
>> Ryan Young
>> MD/MBA Candidate
>> Jacobs School of Medicine & Biomedical Sciences
>>
>

Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

Reply via email to