Hi Ryan, Here is some code for a writer that will do what you want. To use it, get rid of those first two lines in the piper that I sent (set, reader). The default reader will work just fine, and it will allow you to process multiple surgery lists in on run.
Then just add SentenceFirstCuiWriter to the end of your piper. Sean public class SentenceFirstCuiWriter extends AbstractJCasFileWriter { public void writeFile( final JCas jCas, final String outputDir, final String documentId, final String fileName ) throws IOException { File cuiFile = new File( outputDir, fileName + "_cui.txt" ); Map<Sentence, Collection<ProcedureMention>> sentenceMap = JCasUtil.indexCovered( jCas, Sentence.class, ProcedureMention.class ); List<Collection<ProcedureMention>> sortedSentenceProcedures = sentenceMap.entrySet() .stream() .sorted( Map.Entry.comparingByKey( DefaultAspanComparator.INSTANCE ) ) .map( Map.Entry::getValue ) .collect( Collectors.toList() ); try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile ) ) ) { for ( Collection<ProcedureMention> procedures : sortedSentenceProcedures ) { ProcedureMention firstProcedure = procedures.stream() .min( Comparator.comparingInt( ProcedureMention::getBegin ) ) .orElse( null ); if ( firstProcedure != null ) { String cui = OntologyConceptUtil.getCuis( firstProcedure ) .stream() .findFirst() .orElse( "" ); if ( !cui.isEmpty() ) { writer.write( cui + "\n" ); } } } } } } ________________________________________ From: Ryan Young <royo...@buffalo.edu> Sent: Monday, March 23, 2020 11:02 AM To: dev@ctakes.apache.org Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL] * External Email - Caution * Hello, I am a medical student who happened to come across cTAKES for a project I am working on. What I would like to do is take a list of surgeries in a text file and have cTAKES output what it determines to be the best UMLS code (CUI) for that particular line. Each line of the text file is independent of the others (i.e., each line should be read and analyzed separately). For example, here's my list of the surgeries (Surgery_List.txt): Colonoscopy with Polypectomy Esophagogastroduodenoscopy Colonoscopy Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration When I run the piper file (see below), I get the following output: Colonoscopy with Polypectomy "Colonoscopy" Procedure C0009378 colonoscopy "Polypectomy" Procedure C0521210 Resection of polyp Esophagogastroduodenoscopy Colonoscopy "Esophagogastroduodenoscopy" Procedure C0079304 Esophagogastroduodenoscopy "Colonoscopy" Procedure C0009378 colonoscopy Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration "Esophagogastroduodenoscopy" Procedure C0079304 Esophagogastroduodenoscopy "Endoscopic ultrasound" Procedure C0376443 Endoscopic Ultrasound "Endoscopic" Procedure C0014245 Endoscopy (procedure) "ultrasound" Procedure C0041618 Ultrasonography "Fine needle aspiration" Procedure C1510483 Fine needle aspiration biopsy "aspiration" Procedure C0349707 Aspiration-action Here's the piper file I have been using: reader org.apache.ctakes.core.cr.FileTreeReader InputDirectory="C:\path\to\input\folder" load DefaultTokenizerPipeline.piper SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml add ContextDependentTokenizerAnnotator add org.apache.ctakes.necontexts.ContextAnnotator addDescription POSTagger load ChunkerSubPipe.piper set ctakes.umlsuser=my_username ctakes.umlspw=my_password add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml add property.plaintext.PropertyTextWriterFit OutputDirectory="C:\path\to\output\folder" The workaround I have developed is as follows... 1.) Save each line of Surgery_List.txt to separate text files 2.) Use a Python script to parse each individual text file to extract the first UMLS code (CUI) given in the text file The above method works fine when there's only 10 lines, but not so well when there's 40,000 lines in Surgery_List.txt. Ideally, I would like for Fast Dictionary Lookup to just return the top result for each line of Surgery_List.txt. For example, Output.txt would look just like this: C0009378 C0079304 C0079304 Just for reference here's how UMLS codes correspond between Surgery_List.txt and Output.txt: C0009378 --> Colonoscopy with Polypectomy C0079304 --> Esophagogastroduodenoscopy Colonoscopy C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Is there something I can add to the piper file to make this happen? Currently, I have the cTAKES user version installed, but I could install the developer version if need be. I would just need a little guidance on which Java script I would need to modify to get the desired results. Thank You, Ryan Young MD/MBA Candidate Jacobs School of Medicine & Biomedical Sciences