Actually, we are working on a similar tool to compare it to the human
adjudicated standard for the set we tested against.  I didn't mention it
before because the tool isn't complete yet, but initial results for the set
(excluding those marked as "CUI-less") was as follows:

Human adjudicated annotations: 4591 (excluding CUI-less)

Annotations found matching the human adjudicated standard
UMLSProcessor                  2245
FastUMLSProcessor           215

 [image: IMAT Solutions] <>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547

On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei <>
> Bruce,
> Thanks for this-- very useful.
> Perhaps Sean Finan comment more-
> but it's also probably worth it to compare to an adjudicated human
> annotated gold standard.
> --Pei
> -----Original Message-----
> From: Bruce Tietjen []
> Sent: Thursday, December 18, 2014 1:45 PM
> To:
> Subject: cTakes Annotation Comparison
> With the recent release of cTakes 3.2.1, we were very interested in
> checking for any differences in annotations between using the
> AggregatePlaintextUMLSProcessor pipeline and the
> AggregatePlanetextFastUMLSProcessor pipeline within this release of cTakes
> with its associated set of UMLS resources.
> We chose to use the SHARE 14-a-b Training data that consists of 199
> documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the basis
> for the comparison.
> We decided to share a summary of the results with the development
> community.
> Documents Processed: 199
> Processing Time:
> UMLSProcessor           2,439 seconds
> FastUMLSProcessor    1,837 seconds
> Total Annotations Reported:
> UMLSProcessor                  20,365 annotations
> FastUMLSProcessor             8,284 annotations
> Annotation Comparisons:
> Annotations common to both sets:                                  3,940
> Annotations reported only by the UMLSProcessor:         16,425
> Annotations reported only by the FastUMLSProcessor:    4,344
> If anyone is interested, following was our test procedure:
> We used the UIMA CPE to process the document set twice, once using the
> AggregatePlaintextUMLSProcessor pipeline and once using the
> AggregatePlaintextFastUMLSProcessor pipeline. We used the WriteCAStoFile
> CAS consumer to write the results to output files.
> We used a tool we recently developed to analyze and compare the
> annotations generated by the two pipelines. The tool compares the two
> outputs for each file and reports any differences in the annotations
> (MedicationMention, SignSymptomMention, ProcedureMention,
> AnatomicalSiteMention, and
> DiseaseDisorderMention) between the two output sets. The tool reports the
> number of 'matches' and 'misses' between each annotation set. A 'match' is
> defined as the presence of an identified source text interval with its
> associated CUI appearing in both annotation sets. A 'miss' is defined as
> the presence of an identified source text interval and its associated CUI
> in one annotation set, but no matching identified source text interval and
> CUI in the other. The tool also reports the total number of annotations
> (source text intervals with associated CUIs) reported in each annotation
> set. The compare tool is in our GitHub repository at

Reply via email to