Hello Sean, I have run into some difficulty actually running the script you wrote (SentenceFirstCuiWriter.java). I spent the last week doing the following: 1.) Installed cTAKES developer version using Eclipse IDE 2.) Added the appropriate import statements at the beginning of SentenceFirstCuiWriter.java 3.) Placed SentenceFirstCuiWriter.java in this directory: C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\cc 4.) Successfully built and compiled cTAKES developer version 5.) Successfully run the test configurations which were already in cTAKES in Eclipse (Run --> Run As --> Maven test)
My main question is how do I run the cTAKES developer version from command line without running Eclipse or Maven? I found a post you made last year ( http://mail-archives.apache.org/mod_mbox/ctakes-dev/201907.mbox/%3C1563805239741.31947%40childrens.harvard.edu%3E). You stated, *"You can put PipelineBuilder in any main(..) method and then start that main(..) from a command line just as you would any other java program. Just like any other java program, you need to have your $CLASSPATH set correctly and, for memory use, increase your maximum memory with -Xmx . These are VM options."* I think this is what I have to do. But, I am unsure of how to accomplish this exactly. What I have tried already is: 1.) Launch Command Prompt 2.) Change directory to where PipelineBuilder.java is located cd C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\pipeline\PipelineBuilder.java 3.) Enter the following into Command Prompt java org.apache.ctakes.core.pipeline.PiperFileRunner -p C:\Users\Ryan\SkyDrive\Desktop\Piper_File.piper -i C:\Users\Ryan\SkyDrive\Desktop\Input_Folder --writeXmis C:\Users\Ryan\SkyDrive\Desktop\Output_Folder I receive the following error in Command Prompt: Error: Could not find or load main class org.apache.ctakes.core.pipeline.PiperFileRunner I am probably missing something. Just not sure what exactly. I'm not too familiar with Java. The documentation I have been reading hasn't been as helpful since cTAKES is a much more complex project than the simple examples they provide. Lastly, I am using Windows 10. Thank You, Ryan Young MD/MBA Candidate Jacobs School of Medicine & Biomedical Sciences On Mon, Mar 23, 2020 at 3:28 PM Ryan Young <[email protected]> wrote: > Hello Sean, > > Wow! This was a lot more than I was anticipating! Thank you very much! > > To answer your questions... > • I am using Windows 10 > • I have the Python script call a shell command to run a batch file. The > batch file just contains the following line: > "C:\cTAKES_4.0.0\bin\runPiperFile.bat" -p "C:\path\to\piper.piper" > • The Python script waits for the shell command to complete (i.e., when > cTAKES is finished processing) > • The Python script will then parse the output text files and then > continue on with the code > > Prior to calling cTAKES, the surgery list is in a Pandas dataframe. The > workaround I had created was to save each line of the surgery list column > in the dataframe to a different text file to make it easier for when I had > to parse the output cTAKES text file. As I had mentioned previously, I > would like to have just 1 input text file and 1 output text file (as long > as the output file can be easily parsed by Python). > > Regarding my coding background, I don't have much background in Java. > However, a few years ago, I had no knowledge of Python either, but I was > able to teach myself while in medical school. > > A few more questions for you... > 1.) Should I save the code you posted in the following location as a .jar > file? > C:\cTAKES_4.0.0\lib\SentenceFirstCuiWriter.jar > > 2.) Should I replace "add CuiLookupLister" with "add > SentenceFirstCuiWriter" in the piper file or do I need both? > > 3.) If the SentenceFirstCuiWriter is unable to find a valid CUI, will it > leave a blank, N/A, or NaN value? Having any of these values would > definitely help when I have Python parse the output text file. When I have > Python read the output text file, I would have it delete any dataframe rows > with NaN or N/A in the CUI column. > > Thank you very much for your assistance! > > Ryan Young > MD/MBA Candidate > Jacobs School of Medicine & Biomedical Sciences > > On Mon, Mar 23, 2020 at 1:01 PM Finan, Sean < > [email protected]> wrote: > >> Hi Ryan, >> >> Here is some code for a writer that will do what you want. >> To use it, get rid of those first two lines in the piper that I sent >> (set, reader). >> The default reader will work just fine, and it will allow you to process >> multiple surgery lists in on run. >> >> Then just add SentenceFirstCuiWriter to the end of your piper. >> >> Sean >> >> >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter { >> >> public void writeFile( final JCas jCas, final String outputDir, >> final String documentId, final String fileName >> ) throws IOException { >> File cuiFile = new File( outputDir, fileName + "_cui.txt" ); >> Map<Sentence, Collection<ProcedureMention>> sentenceMap >> = JCasUtil.indexCovered( jCas, Sentence.class, >> ProcedureMention.class ); >> List<Collection<ProcedureMention>> sortedSentenceProcedures >> = sentenceMap.entrySet() >> .stream() >> .sorted( Map.Entry.comparingByKey( >> DefaultAspanComparator.INSTANCE ) ) >> .map( Map.Entry::getValue ) >> .collect( Collectors.toList() ); >> try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile ) >> ) ) { >> for ( Collection<ProcedureMention> procedures : >> sortedSentenceProcedures ) { >> ProcedureMention firstProcedure >> = procedures.stream() >> .min( Comparator.comparingInt( >> ProcedureMention::getBegin ) ) >> .orElse( null ); >> if ( firstProcedure != null ) { >> String cui >> = OntologyConceptUtil.getCuis( firstProcedure ) >> .stream() >> .findFirst() >> .orElse( "" ); >> if ( !cui.isEmpty() ) { >> writer.write( cui + "\n" ); >> } >> } >> } >> } >> } >> } >> >> ________________________________________ >> From: Ryan Young <[email protected]> >> Sent: Monday, March 23, 2020 11:02 AM >> To: [email protected] >> Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code >> (CUI) [EXTERNAL] >> >> * External Email - Caution * >> >> >> Hello, >> >> I am a medical student who happened to come across cTAKES for a project I >> am working on. What I would like to do is take a list of surgeries in a >> text file and have cTAKES output what it determines to be the best UMLS >> code (CUI) for that particular line. >> >> Each line of the text file is independent of the others (i.e., each line >> should be read and analyzed separately). For example, here's my list of >> the >> surgeries (Surgery_List.txt): >> Colonoscopy with Polypectomy >> Esophagogastroduodenoscopy Colonoscopy >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle >> aspiration >> >> When I run the piper file (see below), I get the following output: >> Colonoscopy with Polypectomy >> "Colonoscopy" >> Procedure >> C0009378 colonoscopy >> "Polypectomy" >> Procedure >> C0521210 Resection of polyp >> >> Esophagogastroduodenoscopy Colonoscopy >> "Esophagogastroduodenoscopy" >> Procedure >> C0079304 Esophagogastroduodenoscopy >> "Colonoscopy" >> Procedure >> C0009378 colonoscopy >> >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle >> aspiration >> "Esophagogastroduodenoscopy" >> Procedure >> C0079304 Esophagogastroduodenoscopy >> "Endoscopic ultrasound" >> Procedure >> C0376443 Endoscopic Ultrasound >> "Endoscopic" >> Procedure >> C0014245 Endoscopy (procedure) >> "ultrasound" >> Procedure >> C0041618 Ultrasonography >> "Fine needle aspiration" >> Procedure >> C1510483 Fine needle aspiration biopsy >> "aspiration" >> Procedure >> C0349707 Aspiration-action >> >> Here's the piper file I have been using: >> reader org.apache.ctakes.core.cr.FileTreeReader >> InputDirectory="C:\path\to\input\folder" >> load DefaultTokenizerPipeline.piper >> >> SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml >> add ContextDependentTokenizerAnnotator >> add org.apache.ctakes.necontexts.ContextAnnotator >> addDescription POSTagger >> load ChunkerSubPipe.piper >> set ctakes.umlsuser=my_username ctakes.umlspw=my_password >> add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator >> >> DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml >> >> LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml >> add property.plaintext.PropertyTextWriterFit >> OutputDirectory="C:\path\to\output\folder" >> >> The workaround I have developed is as follows... >> 1.) Save each line of Surgery_List.txt to separate text files >> 2.) Use a Python script to parse each individual text file to extract the >> first UMLS code (CUI) given in the text file >> >> The above method works fine when there's only 10 lines, but not so well >> when there's 40,000 lines in Surgery_List.txt. >> >> Ideally, I would like for Fast Dictionary Lookup to just return the top >> result for each line of Surgery_List.txt. For example, Output.txt would >> look just like this: >> C0009378 >> C0079304 >> C0079304 >> >> Just for reference here's how UMLS codes correspond between >> Surgery_List.txt and Output.txt: >> C0009378 --> Colonoscopy with Polypectomy >> C0079304 --> Esophagogastroduodenoscopy Colonoscopy >> C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine >> needle aspiration >> >> Is there something I can add to the piper file to make this happen? >> >> Currently, I have the cTAKES user version installed, but I could install >> the developer version if need be. I would just need a little guidance on >> which Java script I would need to modify to get the desired results. >> >> Thank You, >> >> Ryan Young >> MD/MBA Candidate >> Jacobs School of Medicine & Biomedical Sciences >> >
