Hi Ryan,

Here is some code for a writer that will do what you want.
To use it, get rid of those first two lines in the piper that I sent (set, 
reader).  
The default reader will work just fine, and it will allow you to process 
multiple surgery lists in on run.

Then just add SentenceFirstCuiWriter to the end of your piper.

Sean


public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {

   public void writeFile( final JCas jCas, final String outputDir,
                          final String documentId, final String fileName ) 
throws IOException {
      File cuiFile = new File( outputDir, fileName + "_cui.txt" );
      Map<Sentence, Collection<ProcedureMention>> sentenceMap
            = JCasUtil.indexCovered( jCas, Sentence.class, 
ProcedureMention.class );
      List<Collection<ProcedureMention>> sortedSentenceProcedures
            = sentenceMap.entrySet()
                         .stream()
                         .sorted( Map.Entry.comparingByKey( 
DefaultAspanComparator.INSTANCE ) )
                         .map( Map.Entry::getValue )
                         .collect( Collectors.toList() );
      try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile ) ) ) {
         for ( Collection<ProcedureMention> procedures : 
sortedSentenceProcedures ) {
            ProcedureMention firstProcedure
                  = procedures.stream()
                              .min( Comparator.comparingInt( 
ProcedureMention::getBegin ) )
                              .orElse( null );
            if ( firstProcedure != null ) {
               String cui
                     = OntologyConceptUtil.getCuis( firstProcedure )
                                          .stream()
                                          .findFirst()
                                          .orElse( "" );
               if ( !cui.isEmpty() ) {
                  writer.write( cui + "\n" );
               }
            }
         }
      }
   }
}

________________________________________
From: Ryan Young <royo...@buffalo.edu>
Sent: Monday, March 23, 2020 11:02 AM
To: dev@ctakes.apache.org
Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) 
[EXTERNAL]

* External Email - Caution *


Hello,

I am a medical student who happened to come across cTAKES for a project I
am working on. What I would like to do is take a list of surgeries in a
text file and have cTAKES output what it determines to be the best UMLS
code (CUI) for that particular line.

Each line of the text file is independent of the others (i.e., each line
should be read and analyzed separately). For example, here's my list of the
surgeries (Surgery_List.txt):
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration

When I run the piper file (see below), I get the following output:
Colonoscopy with Polypectomy
"Colonoscopy"
  Procedure
  C0009378 colonoscopy
"Polypectomy"
  Procedure
  C0521210 Resection of polyp

Esophagogastroduodenoscopy Colonoscopy
"Esophagogastroduodenoscopy"
  Procedure
  C0079304 Esophagogastroduodenoscopy
"Colonoscopy"
  Procedure
  C0009378 colonoscopy

Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
"Esophagogastroduodenoscopy"
  Procedure
  C0079304 Esophagogastroduodenoscopy
"Endoscopic ultrasound"
  Procedure
  C0376443 Endoscopic Ultrasound
"Endoscopic"
  Procedure
  C0014245 Endoscopy (procedure)
"ultrasound"
  Procedure
  C0041618 Ultrasonography
"Fine needle aspiration"
  Procedure
  C1510483 Fine needle aspiration biopsy
"aspiration"
  Procedure
  C0349707 Aspiration-action

Here's the piper file I have been using:
reader org.apache.ctakes.core.cr.FileTreeReader
InputDirectory="C:\path\to\input\folder"
load DefaultTokenizerPipeline.piper
SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
add ContextDependentTokenizerAnnotator
add org.apache.ctakes.necontexts.ContextAnnotator
addDescription POSTagger
load ChunkerSubPipe.piper
set ctakes.umlsuser=my_username ctakes.umlspw=my_password
add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
add property.plaintext.PropertyTextWriterFit
OutputDirectory="C:\path\to\output\folder"

The workaround I have developed is as follows...
1.) Save each line of Surgery_List.txt to separate text files
2.) Use a Python script to parse each individual text file to extract the
first UMLS code (CUI) given in the text file

The above method works fine when there's only 10 lines, but not so well
when there's 40,000 lines in Surgery_List.txt.

Ideally, I would like for Fast Dictionary Lookup to just return the top
result for each line of Surgery_List.txt. For example, Output.txt would
look just like this:
C0009378
C0079304
C0079304

Just for reference here's how UMLS codes correspond between
Surgery_List.txt and Output.txt:
C0009378 --> Colonoscopy with Polypectomy
C0079304 --> Esophagogastroduodenoscopy Colonoscopy
C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
needle aspiration

Is there something I can add to the piper file to make this happen?

Currently, I have the cTAKES user version installed, but I could install
the developer version if need be. I would just need a little guidance on
which Java script I would need to modify to get the desired results.

Thank You,

Ryan Young
MD/MBA Candidate
Jacobs School of Medicine & Biomedical Sciences

Reply via email to