Actually, we are working on a similar tool to compare it to the human
adjudicated standard for the set we tested against.  I didn't mention it
before because the tool isn't complete yet, but initial results for the set
(excluding those marked as "CUI-less") was as follows:

Human adjudicated annotations: 4591 (excluding CUI-less)

Annotations found matching the human adjudicated standard
UMLSProcessor                  2245
FastUMLSProcessor           215






 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei <pei.c...@childrens.harvard.edu>
wrote:
>
> Bruce,
> Thanks for this-- very useful.
> Perhaps Sean Finan comment more-
> but it's also probably worth it to compare to an adjudicated human
> annotated gold standard.
>
> --Pei
>
> -----Original Message-----
> From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
> Sent: Thursday, December 18, 2014 1:45 PM
> To: dev@ctakes.apache.org
> Subject: cTakes Annotation Comparison
>
> With the recent release of cTakes 3.2.1, we were very interested in
> checking for any differences in annotations between using the
> AggregatePlaintextUMLSProcessor pipeline and the
> AggregatePlanetextFastUMLSProcessor pipeline within this release of cTakes
> with its associated set of UMLS resources.
>
> We chose to use the SHARE 14-a-b Training data that consists of 199
> documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the basis
> for the comparison.
>
> We decided to share a summary of the results with the development
> community.
>
> Documents Processed: 199
>
> Processing Time:
> UMLSProcessor           2,439 seconds
> FastUMLSProcessor    1,837 seconds
>
> Total Annotations Reported:
> UMLSProcessor                  20,365 annotations
> FastUMLSProcessor             8,284 annotations
>
>
> Annotation Comparisons:
> Annotations common to both sets:                                  3,940
> Annotations reported only by the UMLSProcessor:         16,425
> Annotations reported only by the FastUMLSProcessor:    4,344
>
>
> If anyone is interested, following was our test procedure:
>
> We used the UIMA CPE to process the document set twice, once using the
> AggregatePlaintextUMLSProcessor pipeline and once using the
> AggregatePlaintextFastUMLSProcessor pipeline. We used the WriteCAStoFile
> CAS consumer to write the results to output files.
>
> We used a tool we recently developed to analyze and compare the
> annotations generated by the two pipelines. The tool compares the two
> outputs for each file and reports any differences in the annotations
> (MedicationMention, SignSymptomMention, ProcedureMention,
> AnatomicalSiteMention, and
> DiseaseDisorderMention) between the two output sets. The tool reports the
> number of 'matches' and 'misses' between each annotation set. A 'match' is
> defined as the presence of an identified source text interval with its
> associated CUI appearing in both annotation sets. A 'miss' is defined as
> the presence of an identified source text interval and its associated CUI
> in one annotation set, but no matching identified source text interval and
> CUI in the other. The tool also reports the total number of annotations
> (source text intervals with associated CUIs) reported in each annotation
> set. The compare tool is in our GitHub repository at
> https://github.com/perfectsearch/cTAKES-compare
>

Reply via email to