Hi Ryan, Your piper has a lot of things that you don't need.
Try this: // set the location of the input file and use a reader that treats each line like a different document. set InputFileName=C:/path/to/input/file.txt reader LinesFromFileCollectionReader // Use the default section, sentence, token pipeline load DefaultTokenizerPipeline // Find Parts of Speech for dictionary lookup add POSTagger // Add Dictionary Lookup load DictionarySubPipe // This is a temporary reader ... add CuiLookupLister The CuiLookupLister doesn't do exactly what you want, but you need a custom writer to do that. The LinesFromFileCollectionReader is not ideal, but it does do what you want. Could you please tell me how you are running this? Are you using the submitter gui, the PiperFileRunner class, a shell script or something else? Also, how comfortable are you with java? I will scribble up something more and send it in a minute ... Sean ________________________________________ From: Ryan Young <royo...@buffalo.edu> Sent: Monday, March 23, 2020 11:02 AM To: dev@ctakes.apache.org Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL] * External Email - Caution * Hello, I am a medical student who happened to come across cTAKES for a project I am working on. What I would like to do is take a list of surgeries in a text file and have cTAKES output what it determines to be the best UMLS code (CUI) for that particular line. Each line of the text file is independent of the others (i.e., each line should be read and analyzed separately). For example, here's my list of the surgeries (Surgery_List.txt): Colonoscopy with Polypectomy Esophagogastroduodenoscopy Colonoscopy Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration When I run the piper file (see below), I get the following output: Colonoscopy with Polypectomy "Colonoscopy" Procedure C0009378 colonoscopy "Polypectomy" Procedure C0521210 Resection of polyp Esophagogastroduodenoscopy Colonoscopy "Esophagogastroduodenoscopy" Procedure C0079304 Esophagogastroduodenoscopy "Colonoscopy" Procedure C0009378 colonoscopy Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration "Esophagogastroduodenoscopy" Procedure C0079304 Esophagogastroduodenoscopy "Endoscopic ultrasound" Procedure C0376443 Endoscopic Ultrasound "Endoscopic" Procedure C0014245 Endoscopy (procedure) "ultrasound" Procedure C0041618 Ultrasonography "Fine needle aspiration" Procedure C1510483 Fine needle aspiration biopsy "aspiration" Procedure C0349707 Aspiration-action Here's the piper file I have been using: reader org.apache.ctakes.core.cr.FileTreeReader InputDirectory="C:\path\to\input\folder" load DefaultTokenizerPipeline.piper SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml add ContextDependentTokenizerAnnotator add org.apache.ctakes.necontexts.ContextAnnotator addDescription POSTagger load ChunkerSubPipe.piper set ctakes.umlsuser=my_username ctakes.umlspw=my_password add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml add property.plaintext.PropertyTextWriterFit OutputDirectory="C:\path\to\output\folder" The workaround I have developed is as follows... 1.) Save each line of Surgery_List.txt to separate text files 2.) Use a Python script to parse each individual text file to extract the first UMLS code (CUI) given in the text file The above method works fine when there's only 10 lines, but not so well when there's 40,000 lines in Surgery_List.txt. Ideally, I would like for Fast Dictionary Lookup to just return the top result for each line of Surgery_List.txt. For example, Output.txt would look just like this: C0009378 C0079304 C0079304 Just for reference here's how UMLS codes correspond between Surgery_List.txt and Output.txt: C0009378 --> Colonoscopy with Polypectomy C0079304 --> Esophagogastroduodenoscopy Colonoscopy C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Is there something I can add to the piper file to make this happen? Currently, I have the cTAKES user version installed, but I could install the developer version if need be. I would just need a little guidance on which Java script I would need to modify to get the desired results. Thank You, Ryan Young MD/MBA Candidate Jacobs School of Medicine & Biomedical Sciences