Hi Ryan,

Your piper has a lot of things that you don't need.

Try this:

// set the location of the input file and use a reader that treats each line 
like a different document.
set InputFileName=C:/path/to/input/file.txt
reader LinesFromFileCollectionReader

// Use the default section, sentence, token pipeline
load DefaultTokenizerPipeline

// Find Parts of Speech for dictionary lookup
add POSTagger

// Add Dictionary Lookup
load DictionarySubPipe

// This is a temporary reader ...
add CuiLookupLister


The CuiLookupLister doesn't do exactly what you want, but you need a custom 
writer to do that.
The LinesFromFileCollectionReader is not ideal, but it does do what you want.

Could you please tell me how you are running this?  Are you using the submitter 
gui, the PiperFileRunner class, a shell script or something else?
Also, how comfortable are you with java?

I will scribble up something more and send it in a minute ...

Sean

________________________________________
From: Ryan Young <royo...@buffalo.edu>
Sent: Monday, March 23, 2020 11:02 AM
To: dev@ctakes.apache.org
Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) 
[EXTERNAL]

* External Email - Caution *


Hello,

I am a medical student who happened to come across cTAKES for a project I
am working on. What I would like to do is take a list of surgeries in a
text file and have cTAKES output what it determines to be the best UMLS
code (CUI) for that particular line.

Each line of the text file is independent of the others (i.e., each line
should be read and analyzed separately). For example, here's my list of the
surgeries (Surgery_List.txt):
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration

When I run the piper file (see below), I get the following output:
Colonoscopy with Polypectomy
"Colonoscopy"
  Procedure
  C0009378 colonoscopy
"Polypectomy"
  Procedure
  C0521210 Resection of polyp

Esophagogastroduodenoscopy Colonoscopy
"Esophagogastroduodenoscopy"
  Procedure
  C0079304 Esophagogastroduodenoscopy
"Colonoscopy"
  Procedure
  C0009378 colonoscopy

Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
"Esophagogastroduodenoscopy"
  Procedure
  C0079304 Esophagogastroduodenoscopy
"Endoscopic ultrasound"
  Procedure
  C0376443 Endoscopic Ultrasound
"Endoscopic"
  Procedure
  C0014245 Endoscopy (procedure)
"ultrasound"
  Procedure
  C0041618 Ultrasonography
"Fine needle aspiration"
  Procedure
  C1510483 Fine needle aspiration biopsy
"aspiration"
  Procedure
  C0349707 Aspiration-action

Here's the piper file I have been using:
reader org.apache.ctakes.core.cr.FileTreeReader
InputDirectory="C:\path\to\input\folder"
load DefaultTokenizerPipeline.piper
SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
add ContextDependentTokenizerAnnotator
add org.apache.ctakes.necontexts.ContextAnnotator
addDescription POSTagger
load ChunkerSubPipe.piper
set ctakes.umlsuser=my_username ctakes.umlspw=my_password
add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
add property.plaintext.PropertyTextWriterFit
OutputDirectory="C:\path\to\output\folder"

The workaround I have developed is as follows...
1.) Save each line of Surgery_List.txt to separate text files
2.) Use a Python script to parse each individual text file to extract the
first UMLS code (CUI) given in the text file

The above method works fine when there's only 10 lines, but not so well
when there's 40,000 lines in Surgery_List.txt.

Ideally, I would like for Fast Dictionary Lookup to just return the top
result for each line of Surgery_List.txt. For example, Output.txt would
look just like this:
C0009378
C0079304
C0079304

Just for reference here's how UMLS codes correspond between
Surgery_List.txt and Output.txt:
C0009378 --> Colonoscopy with Polypectomy
C0079304 --> Esophagogastroduodenoscopy Colonoscopy
C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
needle aspiration

Is there something I can add to the piper file to make this happen?

Currently, I have the cTAKES user version installed, but I could install
the developer version if need be. I would just need a little guidance on
which Java script I would need to modify to get the desired results.

Thank You,

Ryan Young
MD/MBA Candidate
Jacobs School of Medicine & Biomedical Sciences

Reply via email to