subject:"RE\: cTakes Annotation Comparison \-\-\- \(\^\:"

Guergana,

I'm curious to the number of records that are in your gold standard
sets, or if your gold standard set was run through a long running cTAKES
process. I know at some point we fixed a bug in the old dictionary
lookup that caused the permutations to become corrupted over time.
Typically this isn't seen in the first few records, but over time as
patterns are used the permutations would become corrupted. This caused
documents that were fed through cTAKES more than once to have less codes
returned than the first time.

For example, if a permutation of 4,2,3,1 was found, the permutation
would be corrupted to be 1,2,3,4. It would no longer be possible to
detect permutations of 4,2,3,1 until cTAKES was restarted. We got the
fix in after the cTAKES 3.2.0 release.
https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the
corpus size, I could see the permutation engine eventually only have a
single permutation of 1,2,3,4.

Typically though, this isn't very easily detected in the first 100 or so
documents.

We discovered this issue when we made cTAKES have consistent output of
codes in our system.

IMAT Solutions http://imatsolutions.com
Kim Ebert
Software Engineer
Office: 801.669.7342
kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com
On 12/19/2014 07:05 AM, Savova, Guergana wrote:
 We are doing a similar kind of evaluation and will report the results.

 Before we released the Fast lookup, we did a systematic evaluation across 
 three gold standard sets. We did not see the trend that Bruce reported below. 
 The P, R and F1 results from the old dictionary look up and the fast one were 
 similar.

 Thank you everyone!
 --Guergana

 -Original Message-
 From: David Kincaid [mailto:kincaid.d...@gmail.com] 
 Sent: Friday, December 19, 2014 9:02 AM
 To: dev@ctakes.apache.org
 Subject: Re: cTakes Annotation Comparison

 Thanks for this, Bruce! Very interesting work. It confirms what I've seen in 
 my small tests that I've done in a non-systematic way. Did you happen to 
 capture the number of false positives yet (annotations made by cTAKES that 
 are not in the human adjudicated standard)? I've seen a lot of dictionary 
 hits that are not actually entity mentions, but I haven't had a chance to do 
 a systematic analysis (we're working on our annotated gold standard now). One 
 great example is the antibiotic Today. Every time the word today appears in 
 any text it is annotated as a medication mention when it almost never is 
 being used in that sense.

 These results by themselves are quite disappointing to me. Both the 
 UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor 
 recall. It seems like the trade off for more speed is a ten-fold (or more) 
 decrease in entity recognition.

 Thanks again for sharing your results with us. I think they are very useful 
 to the project.

 - Dave

 On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen  
 bruce.tiet...@perfectsearchcorp.com wrote:
 Actually, we are working on a similar tool to compare it to the human 
 adjudicated standard for the set we tested against.  I didn't mention 
 it before because the tool isn't complete yet, but initial results for 
 the set (excluding those marked as CUI-less) was as follows:

 Human adjudicated annotations: 4591 (excluding CUI-less)

 Annotations found matching the human adjudicated standard
 UMLSProcessor  2245
 FastUMLSProcessor   215






  [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen 
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei 
 pei.c...@childrens.harvard.edu
 wrote:
 Bruce,
 Thanks for this-- very useful.
 Perhaps Sean Finan comment more-
 but it's also probably worth it to compare to an adjudicated human 
 annotated gold standard.

 --Pei

 -Original Message-
 From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 Sent: Thursday, December 18, 2014 1:45 PM
 To: dev@ctakes.apache.org
 Subject: cTakes Annotation Comparison

 With the recent release of cTakes 3.2.1, we were very interested in 
 checking for any differences in annotations between using the 
 AggregatePlaintextUMLSProcessor pipeline and the 
 AggregatePlanetextFastUMLSProcessor pipeline within this release of
 cTakes
 with its associated set of UMLS resources.

 We chose to use the SHARE 14-a-b Training data that consists of 199 
 documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the 
 basis for the comparison.

 We decided to share a summary of the results with the development 
 community.

 Documents Processed: 199

 Processing Time:
 UMLSProcessor   2,439 seconds
 FastUMLSProcessor1,837 seconds

 Total Annotations Reported:
 UMLSProcessor  20,365 annotations
 FastUMLSProcessor 8,284 annotations


 Annotation Comparisons:
 Annotations common to both sets:  3,940
 Annotations

RE: cTakes Annotation Comparison

2014-12-19 Thread Chen, Pei

Also check out stats that Sean ran before releasing the new component on:
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx
From the evaluation and experience, the new lookup algorithm should be a huge 
improvement in terms of both speed and accuracy.
This is very different than what Bruce mentioned…  I’m sure Sean will chime 
here.
(The old dictionary lookup is essentially obsolete now- plagued with 
bugs/issues as you mentioned.)
--Pei

From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com]
Sent: Friday, December 19, 2014 10:25 AM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Guergana,

I'm curious to the number of records that are in your gold standard sets, or if 
your gold standard set was run through a long running cTAKES process. I know at 
some point we fixed a bug in the old dictionary lookup that caused the 
permutations to become corrupted over time. Typically this isn't seen in the 
first few records, but over time as patterns are used the permutations would 
become corrupted. This caused documents that were fed through cTAKES more than 
once to have less codes returned than the first time.

For example, if a permutation of 4,2,3,1 was found, the permutation would be 
corrupted to be 1,2,3,4. It would no longer be possible to detect permutations 
of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 
release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the 
corpus size, I could see the permutation engine eventually only have a single 
permutation of 1,2,3,4.

Typically though, this isn't very easily detected in the first 100 or so 
documents.

We discovered this issue when we made cTAKES have consistent output of codes in 
our system.

[IMAT Solutions]http://imatsolutions.com
Kim Ebert
Software Engineer
[Office:]801.669.7342
kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com
On 12/19/2014 07:05 AM, Savova, Guergana wrote:

We are doing a similar kind of evaluation and will report the results.



Before we released the Fast lookup, we did a systematic evaluation across three 
gold standard sets. We did not see the trend that Bruce reported below. The P, 
R and F1 results from the old dictionary look up and the fast one were similar.



Thank you everyone!

--Guergana



-Original Message-

From: David Kincaid [mailto:kincaid.d...@gmail.com]

Sent: Friday, December 19, 2014 9:02 AM

To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org

Subject: Re: cTakes Annotation Comparison



Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my 
small tests that I've done in a non-systematic way. Did you happen to capture 
the number of false positives yet (annotations made by cTAKES that are not in 
the human adjudicated standard)? I've seen a lot of dictionary hits that are 
not actually entity mentions, but I haven't had a chance to do a systematic 
analysis (we're working on our annotated gold standard now). One great example 
is the antibiotic Today. Every time the word today appears in any text it is 
annotated as a medication mention when it almost never is being used in that 
sense.



These results by themselves are quite disappointing to me. Both the 
UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor 
recall. It seems like the trade off for more speed is a ten-fold (or more) 
decrease in entity recognition.



Thanks again for sharing your results with us. I think they are very useful to 
the project.



- Dave



On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen  
bruce.tiet...@perfectsearchcorp.commailto:bruce.tiet...@perfectsearchcorp.com
 wrote:



Actually, we are working on a similar tool to compare it to the human

adjudicated standard for the set we tested against.  I didn't mention

it before because the tool isn't complete yet, but initial results for

the set (excluding those marked as CUI-less) was as follows:



Human adjudicated annotations: 4591 (excluding CUI-less)



Annotations found matching the human adjudicated standard

UMLSProcessor  2245

FastUMLSProcessor   215













 [image: IMAT Solutions] http://imatsolutions.comhttp://imatsolutions.com  
Bruce Tietjen

Senior Software Engineer

[image: Mobile:] 801.634.1547

bruce.tiet...@imatsolutions.commailto:bruce.tiet...@imatsolutions.com



On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei

pei.c...@childrens.harvard.edumailto:pei.c...@childrens.harvard.edu



wrote:



Bruce,

Thanks for this-- very useful.

Perhaps Sean Finan comment more-

but it's also probably worth it to compare to an adjudicated human

annotated gold standard.



--Pei



-Original Message-

From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]

Sent: Thursday, December 18, 2014 1:45 PM

To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org

Subject: cTakes Annotation Comparison



With the recent release of cTakes

Re: cTakes Annotation Comparison

2014-12-19 Thread Miller, Timothy

Thanks Kim,
This sounds interesting though I don't totally understand it. Are you saying 
that extraction performance for a given note depends on which order the note 
was in the processing queue? If so that's pretty bad! If you (or anyone else 
who understands this issue) has a concrete example I think that might help me 
understand what the problem is/was.

Even though, as Pei mentioned, we are going to try moving the community to the 
faster dictionary, I would like to understand better just to help myself avoid 
issues of this type going forward (and verify the new dictionary doesn't use 
similar logic).

Also, when we finish annotating the sample notes, might we use that as a point 
of comparison for the two dictionaries? That would get around the issue that 
not everyone has access to the datasets we used for validation and others are 
likely not able to share theirs either. And maybe we can replicate the notes if 
we want to simulate the scenario Kim is talking about with thousands or more 
notes.

Tim


On 12/19/2014 10:24 AM, Kim Ebert wrote:
Guergana,

I'm curious to the number of records that are in your gold standard sets, or if 
your gold standard set was run through a long running cTAKES process. I know at 
some point we fixed a bug in the old dictionary lookup that caused the 
permutations to become corrupted over time. Typically this isn't seen in the 
first few records, but over time as patterns are used the permutations would 
become corrupted. This caused documents that were fed through cTAKES more than 
once to have less codes returned than the first time.

For example, if a permutation of 4,2,3,1 was found, the permutation would be 
corrupted to be 1,2,3,4. It would no longer be possible to detect permutations 
of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 
release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the 
corpus size, I could see the permutation engine eventually only have a single 
permutation of 1,2,3,4.

Typically though, this isn't very easily detected in the first 100 or so 
documents.

We discovered this issue when we made cTAKES have consistent output of codes in 
our system.

[IMAT Solutions]http://imatsolutions.com
Kim Ebert
Software Engineer
[Office:] 801.669.7342
kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com
On 12/19/2014 07:05 AM, Savova, Guergana wrote:

We are doing a similar kind of evaluation and will report the results.

Before we released the Fast lookup, we did a systematic evaluation across three 
gold standard sets. We did not see the trend that Bruce reported below. The P, 
R and F1 results from the old dictionary look up and the fast one were similar.

Thank you everyone!
--Guergana

-Original Message-
From: David Kincaid [mailto:kincaid.d...@gmail.com]
Sent: Friday, December 19, 2014 9:02 AM
To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my 
small tests that I've done in a non-systematic way. Did you happen to capture 
the number of false positives yet (annotations made by cTAKES that are not in 
the human adjudicated standard)? I've seen a lot of dictionary hits that are 
not actually entity mentions, but I haven't had a chance to do a systematic 
analysis (we're working on our annotated gold standard now). One great example 
is the antibiotic Today. Every time the word today appears in any text it is 
annotated as a medication mention when it almost never is being used in that 
sense.

These results by themselves are quite disappointing to me. Both the 
UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor 
recall. It seems like the trade off for more speed is a ten-fold (or more) 
decrease in entity recognition.

Thanks again for sharing your results with us. I think they are very useful to 
the project.

- Dave

On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen  
bruce.tiet...@perfectsearchcorp.commailto:bruce.tiet...@perfectsearchcorp.com
 wrote:


Actually, we are working on a similar tool to compare it to the human
adjudicated standard for the set we tested against.  I didn't mention
it before because the tool isn't complete yet, but initial results for
the set (excluding those marked as CUI-less) was as follows:

Human adjudicated annotations: 4591 (excluding CUI-less)

Annotations found matching the human adjudicated standard
UMLSProcessor  2245
FastUMLSProcessor   215






 [image: IMAT Solutions] http://imatsolutions.comhttp://imatsolutions.com  
Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.commailto:bruce.tiet...@imatsolutions.com

On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei
pei.c...@childrens.harvard.edumailto:pei.c...@childrens.harvard.edu


wrote:


Bruce,
Thanks for this-- very useful.
Perhaps Sean Finan comment more-
but it's also probably

Re: cTakes Annotation Comparison

 that Bruce reported below. 
 The P, R and F1 results from the old dictionary look up and the fast one were 
 similar.

 Thank you everyone!
 --Guergana

 -Original Message-
 From: David Kincaid [mailto:kincaid.d...@gmail.com]
 Sent: Friday, December 19, 2014 9:02 AM
 To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org
 Subject: Re: cTakes Annotation Comparison

 Thanks for this, Bruce! Very interesting work. It confirms what I've seen in 
 my small tests that I've done in a non-systematic way. Did you happen to 
 capture the number of false positives yet (annotations made by cTAKES that 
 are not in the human adjudicated standard)? I've seen a lot of dictionary 
 hits that are not actually entity mentions, but I haven't had a chance to do 
 a systematic analysis (we're working on our annotated gold standard now). One 
 great example is the antibiotic Today. Every time the word today appears in 
 any text it is annotated as a medication mention when it almost never is 
 being used in that sense.

 These results by themselves are quite disappointing to me. Both the 
 UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor 
 recall. It seems like the trade off for more speed is a ten-fold (or more) 
 decrease in entity recognition.

 Thanks again for sharing your results with us. I think they are very useful 
 to the project.

 - Dave

 On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen  
 bruce.tiet...@perfectsearchcorp.commailto:bruce.tiet...@perfectsearchcorp.com
  wrote:


 Actually, we are working on a similar tool to compare it to the human
 adjudicated standard for the set we tested against.  I didn't mention
 it before because the tool isn't complete yet, but initial results for
 the set (excluding those marked as CUI-less) was as follows:

 Human adjudicated annotations: 4591 (excluding CUI-less)

 Annotations found matching the human adjudicated standard
 UMLSProcessor  2245
 FastUMLSProcessor   215






  [image: IMAT Solutions] http://imatsolutions.comhttp://imatsolutions.com 
  Bruce Tietjen
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.commailto:bruce.tiet...@imatsolutions.com

 On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei
 pei.c...@childrens.harvard.edumailto:pei.c...@childrens.harvard.edu


 wrote:


 Bruce,
 Thanks for this-- very useful.
 Perhaps Sean Finan comment more-
 but it's also probably worth it to compare to an adjudicated human
 annotated gold standard.

 --Pei

 -Original Message-
 From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 Sent: Thursday, December 18, 2014 1:45 PM
 To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org
 Subject: cTakes Annotation Comparison

 With the recent release of cTakes 3.2.1, we were very interested in
 checking for any differences in annotations between using the
 AggregatePlaintextUMLSProcessor pipeline and the
 AggregatePlanetextFastUMLSProcessor pipeline within this release of


 cTakes


 with its associated set of UMLS resources.

 We chose to use the SHARE 14-a-b Training data that consists of 199
 documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the
 basis for the comparison.

 We decided to share a summary of the results with the development
 community.

 Documents Processed: 199

 Processing Time:
 UMLSProcessor   2,439 seconds
 FastUMLSProcessor1,837 seconds

 Total Annotations Reported:
 UMLSProcessor  20,365 annotations
 FastUMLSProcessor 8,284 annotations


 Annotation Comparisons:
 Annotations common to both sets:  3,940
 Annotations reported only by the UMLSProcessor: 16,425
 Annotations reported only by the FastUMLSProcessor:4,344


 If anyone is interested, following was our test procedure:

 We used the UIMA CPE to process the document set twice, once using
 the AggregatePlaintextUMLSProcessor pipeline and once using the
 AggregatePlaintextFastUMLSProcessor pipeline. We used the
 WriteCAStoFile CAS consumer to write the results to output files.

 We used a tool we recently developed to analyze and compare the
 annotations generated by the two pipelines. The tool compares the
 two outputs for each file and reports any differences in the
 annotations (MedicationMention, SignSymptomMention,
 ProcedureMention, AnatomicalSiteMention, and
 DiseaseDisorderMention) between the two output sets. The tool
 reports the number of 'matches' and 'misses' between each annotation set. A 
 'match'


 is


 defined as the presence of an identified source text interval with
 its associated CUI appearing in both annotation sets. A 'miss' is
 defined as the presence of an identified source text interval and
 its associated CUI in one annotation set, but no matching identified
 source text interval


 and


 CUI in the other. The tool also reports the total number of
 annotations (source text intervals with associated CUIs) reported in
 each

RE: cTakes Annotation Comparison

2014-12-19 Thread Savova, Guergana

Several thoughts:
1. The ShARE corpus annotates only mentions of type Diseases/Disorders and only 
Anatomical Sites associated with a Disease/Disorder. This is by design. cTAKES 
annotates all mentions of types Diseases/Disorders, Signs/Symptoms, Procedures, 
Medications and Anatomical Sites. Therefore you will get MANY more annotations 
with cTAKES. Eventually the ShARe corpus will be expanded to the other types.

2. Keeping (1) in mind, you can approximately estimate the precision/recall/f1 
of cTAKES on the ShARe corpus if you output only mentions of type 
Disease/Disorder. 

3. Could you send us the list of files you use from ShARe to test? We have the 
corpus and would like to run against as well.

Hope this makes sense...
--Guergana

-Original Message-
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Friday, December 19, 2014 1:16 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Our analysis against the human adjudicated gold standard from this SHARE corpus 
is using a simple check to see if the cTakes output included the annotation 
specified by the gold standard. The initial results I reported were for exact 
matches of CUI and text span.  Only exact matches were counted.

It looks like if we also count as matches cTakes annotations with a matching 
CUI and a text span that overlaps the gold standard text span then the matches 
increase to 224 matching annotations for the FastUMLS pipeline and 2319 for the 
the old pipeline.

The question was also asked about annotations in the cTakes output that were 
not in the human adjudicated gold standard. The answer is yes, there were a lot 
of additional annotations made by cTakes that don't appear to be in the gold 
standard. We haven't analyzed that yet, but it looks like the gold standard we 
are using may only have Disease_Disorder annotations.



 [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen Senior 
Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy  
timothy.mil...@childrens.harvard.edu wrote:

 Thanks Kim,
 This sounds interesting though I don't totally understand it. Are you 
 saying that extraction performance for a given note depends on which 
 order the note was in the processing queue? If so that's pretty bad! 
 If you (or anyone else who understands this issue) has a concrete 
 example I think that might help me understand what the problem is/was.

 Even though, as Pei mentioned, we are going to try moving the 
 community to the faster dictionary, I would like to understand better 
 just to help myself avoid issues of this type going forward (and 
 verify the new dictionary doesn't use similar logic).

 Also, when we finish annotating the sample notes, might we use that as 
 a point of comparison for the two dictionaries? That would get around 
 the issue that not everyone has access to the datasets we used for 
 validation and others are likely not able to share theirs either. And 
 maybe we can replicate the notes if we want to simulate the scenario 
 Kim is talking about with thousands or more notes.

 Tim


 On 12/19/2014 10:24 AM, Kim Ebert wrote:
 Guergana,

 I'm curious to the number of records that are in your gold standard 
 sets, or if your gold standard set was run through a long running cTAKES 
 process.
 I know at some point we fixed a bug in the old dictionary lookup that 
 caused the permutations to become corrupted over time. Typically this 
 isn't seen in the first few records, but over time as patterns are 
 used the permutations would become corrupted. This caused documents 
 that were fed through cTAKES more than once to have less codes 
 returned than the first time.

 For example, if a permutation of 4,2,3,1 was found, the permutation 
 would be corrupted to be 1,2,3,4. It would no longer be possible to 
 detect permutations of 4,2,3,1 until cTAKES was restarted. We got the 
 fix in after the cTAKES 3.2.0 release. 
 https://issues.apache.org/jira/browse/CTAKES-310
 Depending upon the corpus size, I could see the permutation engine 
 eventually only have a single permutation of 1,2,3,4.

 Typically though, this isn't very easily detected in the first 100 or 
 so documents.

 We discovered this issue when we made cTAKES have consistent output of 
 codes in our system.

 [IMAT Solutions]http://imatsolutions.com
 Kim Ebert
 Software Engineer
 [Office:] 801.669.7342
 kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com
 On 12/19/2014 07:05 AM, Savova, Guergana wrote:

 We are doing a similar kind of evaluation and will report the results.

 Before we released the Fast lookup, we did a systematic evaluation 
 across three gold standard sets. We did not see the trend that Bruce 
 reported below. The P, R and F1 results from the old dictionary look 
 up and the fast one were similar.

 Thank you everyone!
 --Guergana

 -Original Message-
 From: David Kincaid

RE: cTakes Annotation Comparison

One quick mention:

The cTakes dictionaries are built with UMLS 2011AB.  If the Human annotations 
were not done using the same UMLS version then there WILL be differences in CUI 
and Semantic group.  I don't have time to go into it with details, examples, 
etc. just be aware that every 6 months cuis are added, removed, deprecated, and 
moved from one TUI to another.

Sean

-Original Message-
From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] 
Sent: Friday, December 19, 2014 1:28 PM
To: dev@ctakes.apache.org
Subject: RE: cTakes Annotation Comparison

Several thoughts:
1. The ShARE corpus annotates only mentions of type Diseases/Disorders and only 
Anatomical Sites associated with a Disease/Disorder. This is by design. cTAKES 
annotates all mentions of types Diseases/Disorders, Signs/Symptoms, Procedures, 
Medications and Anatomical Sites. Therefore you will get MANY more annotations 
with cTAKES. Eventually the ShARe corpus will be expanded to the other types.

2. Keeping (1) in mind, you can approximately estimate the precision/recall/f1 
of cTAKES on the ShARe corpus if you output only mentions of type 
Disease/Disorder. 

3. Could you send us the list of files you use from ShARe to test? We have the 
corpus and would like to run against as well.

Hope this makes sense...
--Guergana

-Original Message-
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Friday, December 19, 2014 1:16 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Our analysis against the human adjudicated gold standard from this SHARE corpus 
is using a simple check to see if the cTakes output included the annotation 
specified by the gold standard. The initial results I reported were for exact 
matches of CUI and text span.  Only exact matches were counted.

It looks like if we also count as matches cTakes annotations with a matching 
CUI and a text span that overlaps the gold standard text span then the matches 
increase to 224 matching annotations for the FastUMLS pipeline and 2319 for the 
the old pipeline.

The question was also asked about annotations in the cTakes output that were 
not in the human adjudicated gold standard. The answer is yes, there were a lot 
of additional annotations made by cTakes that don't appear to be in the gold 
standard. We haven't analyzed that yet, but it looks like the gold standard we 
are using may only have Disease_Disorder annotations.



 [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen Senior 
Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy  
timothy.mil...@childrens.harvard.edu wrote:

 Thanks Kim,
 This sounds interesting though I don't totally understand it. Are you 
 saying that extraction performance for a given note depends on which 
 order the note was in the processing queue? If so that's pretty bad! 
 If you (or anyone else who understands this issue) has a concrete 
 example I think that might help me understand what the problem is/was.

 Even though, as Pei mentioned, we are going to try moving the 
 community to the faster dictionary, I would like to understand better 
 just to help myself avoid issues of this type going forward (and 
 verify the new dictionary doesn't use similar logic).

 Also, when we finish annotating the sample notes, might we use that as 
 a point of comparison for the two dictionaries? That would get around 
 the issue that not everyone has access to the datasets we used for 
 validation and others are likely not able to share theirs either. And 
 maybe we can replicate the notes if we want to simulate the scenario 
 Kim is talking about with thousands or more notes.

 Tim


 On 12/19/2014 10:24 AM, Kim Ebert wrote:
 Guergana,

 I'm curious to the number of records that are in your gold standard 
 sets, or if your gold standard set was run through a long running cTAKES 
 process.
 I know at some point we fixed a bug in the old dictionary lookup that 
 caused the permutations to become corrupted over time. Typically this 
 isn't seen in the first few records, but over time as patterns are 
 used the permutations would become corrupted. This caused documents 
 that were fed through cTAKES more than once to have less codes 
 returned than the first time.

 For example, if a permutation of 4,2,3,1 was found, the permutation 
 would be corrupted to be 1,2,3,4. It would no longer be possible to 
 detect permutations of 4,2,3,1 until cTAKES was restarted. We got the 
 fix in after the cTAKES 3.2.0 release. 
 https://issues.apache.org/jira/browse/CTAKES-310
 Depending upon the corpus size, I could see the permutation engine 
 eventually only have a single permutation of 1,2,3,4.

 Typically though, this isn't very easily detected in the first 100 or 
 so documents.

 We discovered this issue when we made cTAKES have consistent output of 
 codes in our system.

 [IMAT Solutions]http

Re: cTakes Annotation Comparison

Sean,

I don't think that would be an issue since both the rare word lookup and
the first word lookup are using UMLS 2011AB. Or is the rare word lookup
using a different dictionary?

I would expect roughly similar results between the two when it comes to
differences between UMLS versions.

IMAT Solutions http://imatsolutions.com
Kim Ebert
Software Engineer
Office: 801.669.7342
kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com
On 12/19/2014 11:31 AM, Finan, Sean wrote:
 One quick mention:

 The cTakes dictionaries are built with UMLS 2011AB.  If the Human annotations 
 were not done using the same UMLS version then there WILL be differences in 
 CUI and Semantic group.  I don't have time to go into it with details, 
 examples, etc. just be aware that every 6 months cuis are added, removed, 
 deprecated, and moved from one TUI to another.

 Sean

 -Original Message-
 From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] 
 Sent: Friday, December 19, 2014 1:28 PM
 To: dev@ctakes.apache.org
 Subject: RE: cTakes Annotation Comparison

 Several thoughts:
 1. The ShARE corpus annotates only mentions of type Diseases/Disorders and 
 only Anatomical Sites associated with a Disease/Disorder. This is by design. 
 cTAKES annotates all mentions of types Diseases/Disorders, Signs/Symptoms, 
 Procedures, Medications and Anatomical Sites. Therefore you will get MANY 
 more annotations with cTAKES. Eventually the ShARe corpus will be expanded to 
 the other types.

 2. Keeping (1) in mind, you can approximately estimate the 
 precision/recall/f1 of cTAKES on the ShARe corpus if you output only mentions 
 of type Disease/Disorder. 

 3. Could you send us the list of files you use from ShARe to test? We have 
 the corpus and would like to run against as well.

 Hope this makes sense...
 --Guergana

 -Original Message-
 From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
 Sent: Friday, December 19, 2014 1:16 PM
 To: dev@ctakes.apache.org
 Subject: Re: cTakes Annotation Comparison

 Our analysis against the human adjudicated gold standard from this SHARE 
 corpus is using a simple check to see if the cTakes output included the 
 annotation specified by the gold standard. The initial results I reported 
 were for exact matches of CUI and text span.  Only exact matches were counted.

 It looks like if we also count as matches cTakes annotations with a matching 
 CUI and a text span that overlaps the gold standard text span then the 
 matches increase to 224 matching annotations for the FastUMLS pipeline and 
 2319 for the the old pipeline.

 The question was also asked about annotations in the cTakes output that were 
 not in the human adjudicated gold standard. The answer is yes, there were a 
 lot of additional annotations made by cTakes that don't appear to be in the 
 gold standard. We haven't analyzed that yet, but it looks like the gold 
 standard we are using may only have Disease_Disorder annotations.



  [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen Senior 
 Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy  
 timothy.mil...@childrens.harvard.edu wrote:
 Thanks Kim,
 This sounds interesting though I don't totally understand it. Are you 
 saying that extraction performance for a given note depends on which 
 order the note was in the processing queue? If so that's pretty bad! 
 If you (or anyone else who understands this issue) has a concrete 
 example I think that might help me understand what the problem is/was.

 Even though, as Pei mentioned, we are going to try moving the 
 community to the faster dictionary, I would like to understand better 
 just to help myself avoid issues of this type going forward (and 
 verify the new dictionary doesn't use similar logic).

 Also, when we finish annotating the sample notes, might we use that as 
 a point of comparison for the two dictionaries? That would get around 
 the issue that not everyone has access to the datasets we used for 
 validation and others are likely not able to share theirs either. And 
 maybe we can replicate the notes if we want to simulate the scenario 
 Kim is talking about with thousands or more notes.

 Tim


 On 12/19/2014 10:24 AM, Kim Ebert wrote:
 Guergana,

 I'm curious to the number of records that are in your gold standard 
 sets, or if your gold standard set was run through a long running cTAKES 
 process.
 I know at some point we fixed a bug in the old dictionary lookup that 
 caused the permutations to become corrupted over time. Typically this 
 isn't seen in the first few records, but over time as patterns are 
 used the permutations would become corrupted. This caused documents 
 that were fed through cTAKES more than once to have less codes 
 returned than the first time.

 For example, if a permutation of 4,2,3,1 was found, the permutation 
 would be corrupted to be 1,2,3,4

RE: cTakes Annotation Comparison

I’m bringing it up in case the Human Annotations were done using a different 
version.

From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com]
Sent: Friday, December 19, 2014 1:40 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Sean,

I don't think that would be an issue since both the rare word lookup and the 
first word lookup are using UMLS 2011AB. Or is the rare word lookup using a 
different dictionary?

I would expect roughly similar results between the two when it comes to 
differences between UMLS versions.

[IMAT Solutions]http://imatsolutions.com
Kim Ebert
Software Engineer
[Office:]801.669.7342
kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com
On 12/19/2014 11:31 AM, Finan, Sean wrote:

One quick mention:

The cTakes dictionaries are built with UMLS 2011AB.  If the Human annotations 
were not done using the same UMLS version then there WILL be differences in CUI 
and Semantic group.  I don't have time to go into it with details, examples, 
etc. just be aware that every 6 months cuis are added, removed, deprecated, and 
moved from one TUI to another.

Sean

-Original Message-

From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu]

Sent: Friday, December 19, 2014 1:28 PM

To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org

Subject: RE: cTakes Annotation Comparison

Several thoughts:

1. The ShARE corpus annotates only mentions of type Diseases/Disorders and only 
Anatomical Sites associated with a Disease/Disorder. This is by design. cTAKES 
annotates all mentions of types Diseases/Disorders, Signs/Symptoms, Procedures, 
Medications and Anatomical Sites. Therefore you will get MANY more annotations 
with cTAKES. Eventually the ShARe corpus will be expanded to the other types.

2. Keeping (1) in mind, you can approximately estimate the precision/recall/f1 
of cTAKES on the ShARe corpus if you output only mentions of type 
Disease/Disorder.

3. Could you send us the list of files you use from ShARe to test? We have the 
corpus and would like to run against as well.

Hope this makes sense...

--Guergana

-Original Message-

From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]

Sent: Friday, December 19, 2014 1:16 PM

To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org

Subject: Re: cTakes Annotation Comparison

Our analysis against the human adjudicated gold standard from this SHARE corpus 
is using a simple check to see if the cTakes output included the annotation 
specified by the gold standard. The initial results I reported were for exact 
matches of CUI and text span.  Only exact matches were counted.

It looks like if we also count as matches cTakes annotations with a matching 
CUI and a text span that overlaps the gold standard text span then the matches 
increase to 224 matching annotations for the FastUMLS pipeline and 2319 for the 
the old pipeline.

The question was also asked about annotations in the cTakes output that were 
not in the human adjudicated gold standard. The answer is yes, there were a lot 
of additional annotations made by cTakes that don't appear to be in the gold 
standard. We haven't analyzed that yet, but it looks like the gold standard we 
are using may only have Disease_Disorder annotations.

 [image: IMAT Solutions] http://imatsolutions.comhttp://imatsolutions.com  
Bruce Tietjen Senior Software Engineer

[image: Mobile:] 801.634.1547

bruce.tiet...@imatsolutions.commailto:bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy  
timothy.mil...@childrens.harvard.edumailto:timothy.mil...@childrens.harvard.edu
 wrote:

Thanks Kim,

This sounds interesting though I don't totally understand it. Are you

saying that extraction performance for a given note depends on which

order the note was in the processing queue? If so that's pretty bad!

If you (or anyone else who understands this issue) has a concrete

example I think that might help me understand what the problem is/was.

Even though, as Pei mentioned, we are going to try moving the

community to the faster dictionary, I would like to understand better

just to help myself avoid issues of this type going forward (and

verify the new dictionary doesn't use similar logic).

Also, when we finish annotating the sample notes, might we use that as

a point of comparison for the two dictionaries? That would get around

the issue that not everyone has access to the datasets we used for

validation and others are likely not able to share theirs either. And

maybe we can replicate the notes if we want to simulate the scenario

Kim is talking about with thousands or more notes.

Tim

On 12/19/2014 10:24 AM, Kim Ebert wrote:

Guergana,

I'm curious to the number of records that are in your gold standard

sets, or if your gold standard set was run through a long running cTAKES 
process.

I know at some point we fixed a bug in the old dictionary lookup that

caused

Re: cTakes Annotation Comparison

Pei,

I don't think bugs/issues should be part of determining if one algorithm
vs the other is superior. Obviously, it is worth mentioning the bugs,
but if the fast lookup method has worse precision and recall but better
performance, vs the slower but more accurate first word lookup
algorithm, then time should be invested in fixing those bugs and
resolving those weird issues.

Now I'm not saying which one is superior in this case, as the data will
end up speaking for itself one way or the other; bus as of right now,
I'm not convinced yet that the old dictionary lookup is obsolete yet,
and I'm not sure the community is convinced yet either.

IMAT Solutions http://imatsolutions.com
Kim Ebert
Software Engineer
Office: 801.669.7342
kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com
On 12/19/2014 08:39 AM, Chen, Pei wrote:

Also check out stats that Sean ran before releasing the new component on:

http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx

From the evaluation and experience, the new lookup algorithm should be
a huge improvement in terms of both speed and accuracy.

This is very different than what Bruce mentioned… I’m sure Sean will
chime here.

(The old dictionary lookup is essentially obsolete now- plagued with
bugs/issues as you mentioned.)

--Pei

*From:*Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com]
*Sent:* Friday, December 19, 2014 10:25 AM
*To:* dev@ctakes.apache.org
*Subject:* Re: cTakes Annotation Comparison

Guergana,

I'm curious to the number of records that are in your gold standard
sets, or if your gold standard set was run through a long running
cTAKES process. I know at some point we fixed a bug in the old
dictionary lookup that caused the permutations to become corrupted
over time. Typically this isn't seen in the first few records, but
over time as patterns are used the permutations would become
corrupted. This caused documents that were fed through cTAKES more
than once to have less codes returned than the first time.

For example, if a permutation of 4,2,3,1 was found, the permutation
would be corrupted to be 1,2,3,4. It would no longer be possible to
detect permutations of 4,2,3,1 until cTAKES was restarted. We got the
fix in after the cTAKES 3.2.0 release.
https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the
corpus size, I could see the permutation engine eventually only have a
single permutation of 1,2,3,4.

Typically though, this isn't very easily detected in the first 100 or
so documents.

We discovered this issue when we made cTAKES have consistent output of
codes in our system.

IMAT Solutions http://imatsolutions.com

*Kim Ebert*
Software Engineer
Office:801.669.7342
kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com

On 12/19/2014 07:05 AM, Savova, Guergana wrote:

We are doing a similar kind of evaluation and will report the results.

Before we released the Fast lookup, we did a systematic evaluation across
three gold standard sets. We did not see the trend that Bruce reported below.
The P, R and F1 results from the old dictionary look up and the fast one were
similar.

Thank you everyone!

--Guergana

-Original Message-

From: David Kincaid [mailto:kincaid.d...@gmail.com]

Sent: Friday, December 19, 2014 9:02 AM

To: dev@ctakes.apache.org mailto:dev@ctakes.apache.org

Subject: Re: cTakes Annotation Comparison

Thanks for this, Bruce! Very interesting work. It confirms what I've seen
in my small tests that I've done in a non-systematic way. Did you happen to
capture the number of false positives yet (annotations made by cTAKES that
are not in the human adjudicated standard)? I've seen a lot of dictionary
hits that are not actually entity mentions, but I haven't had a chance to do
a systematic analysis (we're working on our annotated gold standard now). One
great example is the antibiotic Today. Every time the word today appears in
any text it is annotated as a medication mention when it almost never is
being used in that sense.

These results by themselves are quite disappointing to me. Both the
UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor
recall. It seems like the trade off for more speed is a ten-fold (or more)
decrease in entity recognition.

Thanks again for sharing your results with us. I think they are very
useful to the project.

- Dave

On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen
bruce.tiet...@perfectsearchcorp.com
mailto:bruce.tiet...@perfectsearchcorp.com wrote:

Actually, we are working on a similar tool to compare it to the human

adjudicated standard for the set we tested against. I didn't mention

it before because the tool isn't complete yet, but initial

Re: cTakes Annotation Comparison

Rather than spam the mailing list with the list of filenames for the files
in the set we used, I would be happy to send it to anyone interested
privately.

[image: IMAT Solutions] http://imatsolutions.com
Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 11:47 AM, Kim Ebert kim.eb...@imatsolutions.com
wrote:

Pei,

I don't think bugs/issues should be part of determining if one algorithm
vs the other is superior. Obviously, it is worth mentioning the bugs, but
if the fast lookup method has worse precision and recall but better
performance, vs the slower but more accurate first word lookup algorithm,
then time should be invested in fixing those bugs and resolving those weird
issues.

Now I'm not saying which one is superior in this case, as the data will
end up speaking for itself one way or the other; bus as of right now, I'm
not convinced yet that the old dictionary lookup is obsolete yet, and I'm
not sure the community is convinced yet either.

[image: IMAT Solutions] http://imatsolutions.com
Kim Ebert
Software Engineer
[image: Office:] 801.669.7342
kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com
On 12/19/2014 08:39 AM, Chen, Pei wrote:

Also check out stats that Sean ran before releasing the new component on:

http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx

From the evaluation and experience, the new lookup algorithm should be a
huge improvement in terms of both speed and accuracy.

This is very different than what Bruce mentioned… I’m sure Sean will
chime here.

(The old dictionary lookup is essentially obsolete now- plagued with
bugs/issues as you mentioned.)

--Pei

*From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com
kim.eb...@perfectsearchcorp.com]
*Sent:* Friday, December 19, 2014 10:25 AM
*To:* dev@ctakes.apache.org
*Subject:* Re: cTakes Annotation Comparison

Guergana,

I'm curious to the number of records that are in your gold standard sets,
or if your gold standard set was run through a long running cTAKES process.
I know at some point we fixed a bug in the old dictionary lookup that
caused the permutations to become corrupted over time. Typically this isn't
seen in the first few records, but over time as patterns are used the
permutations would become corrupted. This caused documents that were fed
through cTAKES more than once to have less codes returned than the first
time.

For example, if a permutation of 4,2,3,1 was found, the permutation would
be corrupted to be 1,2,3,4. It would no longer be possible to detect
permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after
the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310
Depending upon the corpus size, I could see the permutation engine
eventually only have a single permutation of 1,2,3,4.

Typically though, this isn't very easily detected in the first 100 or so
documents.

We discovered this issue when we made cTAKES have consistent output of
codes in our system.

[image: IMAT Solutions] http://imatsolutions.com

*Kim Ebert*
Software Engineer
[image: Office:]801.669.7342
kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com

On 12/19/2014 07:05 AM, Savova, Guergana wrote:

We are doing a similar kind of evaluation and will report the results.

Thank you everyone!

--Guergana

-Original Message-

From: David Kincaid [mailto:kincaid.d...@gmail.com kincaid.d...@gmail.com]

Sent: Friday, December 19, 2014 9:02 AM

To: dev@ctakes.apache.org

Subject: Re: cTakes Annotation Comparison

Thanks for this, Bruce! Very interesting work. It confirms what I've seen in
my small tests that I've done in a non-systematic way. Did you happen to
capture the number of false positives yet (annotations made by cTAKES that
are not in the human adjudicated standard)? I've seen a lot of dictionary
hits that are not actually entity mentions, but I haven't had a chance to do
a systematic analysis (we're working on our annotated gold standard now). One
great example is the antibiotic Today. Every time the word today appears in
any text it is annotated as a medication mention when it almost never is
being used in that sense.

Thanks again for sharing your results with us. I think they are very useful
to the project.

- Dave

On Thu, Dec 18, 2014 at 5:06 PM

Re: cTakes Annotation Comparison

Correction -- So far, I did steps 1 and 2 of Sean's email.

[image: IMAT Solutions] http://imatsolutions.com
Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 1:22 PM, Bruce Tietjen
bruce.tiet...@perfectsearchcorp.com wrote:

Sean,

I tried the configuration changes you mentioned in your earlier email.

The results are as follows:

Total Annotations found: 12,161 (default configuration found 8,284)

If counting exact span matches, this run only matched 211 (default
configuration matched 215).

If counting overlapping spans, this run only matched 220 (default
configuration matched 224)

Bruce

[image: IMAT Solutions] http://imatsolutions.com
Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei
pei.c...@childrens.harvard.edu wrote:

Kim,

Maintenance is the factor not bugs/issue to forge ahead.

They are 2 components that do the same thing with the same goal (As Sean
mentioned, one should be able configure the new code base to replicate the
old algorithm if required- it’s just a simpler and cleaner code base. If
this is not the case or if there are issues, we should fix it and move
forward.).

We can keep the old component around for as long as needed, but it’s
likely going to have limited support…

--Pei

*From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com]
*Sent:* Friday, December 19, 2014 1:47 PM
*To:* Chen, Pei; dev@ctakes.apache.org

*Subject:* Re: cTakes Annotation Comparison

Pei,

I don't think bugs/issues should be part of determining if one algorithm
vs the other is superior. Obviously, it is worth mentioning the bugs, but
if the fast lookup method has worse precision and recall but better
performance, vs the slower but more accurate first word lookup algorithm,
then time should be invested in fixing those bugs and resolving those weird
issues.

Now I'm not saying which one is superior in this case, as the data will
end up speaking for itself one way or the other; bus as of right now, I'm
not convinced yet that the old dictionary lookup is obsolete yet, and I'm
not sure the community is convinced yet either.

[image: IMAT Solutions] http://imatsolutions.com

*Kim Ebert*
Software Engineer
[image: Office:]801.669.7342
kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com

On 12/19/2014 08:39 AM, Chen, Pei wrote:

Also check out stats that Sean ran before releasing the new component on:

http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx

From the evaluation and experience, the new lookup algorithm should be a
huge improvement in terms of both speed and accuracy.

This is very different than what Bruce mentioned… I’m sure Sean will
chime here.

(The old dictionary lookup is essentially obsolete now- plagued with
bugs/issues as you mentioned.)

--Pei

Guergana,

I'm curious to the number of records that are in your gold standard sets,
or if your gold standard set was run through a long running cTAKES process.
I know at some point we fixed a bug in the old dictionary lookup that
caused the permutations to become corrupted over time. Typically this isn't
seen in the first few records, but over time as patterns are used the
permutations would become corrupted. This caused documents that were fed
through cTAKES more than once to have less codes returned than the first
time.

For example, if a permutation of 4,2,3,1 was found, the permutation would
be corrupted to be 1,2,3,4. It would no longer be possible to detect
permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after
the cTAKES 3.2.0 release.
https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the
corpus size, I could see the permutation engine eventually only have a
single permutation of 1,2,3,4.

Typically though, this isn't very easily detected in the first 100 or so
documents.

We discovered this issue when we made cTAKES have consistent output of
codes in our system.

[image: IMAT Solutions] http://imatsolutions.com

*Kim Ebert*
Software Engineer
[image: Office:]801.669.7342
kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com

On 12/19/2014 07:05 AM, Savova, Guergana wrote:

We are doing a similar kind of evaluation and will report the results.

Before we released the Fast lookup, we did a systematic evaluation across
three gold standard sets. We did not see the trend that Bruce reported
below. The P, R and F1 results from the old dictionary look up and the fast
one were similar.

Thank you everyone!

--Guergana

-Original Message-

From: David

RE: cTakes Annotation Comparison

Hi Bruce,

I'm not sure how there would be fewer matches with the overlap processor.  
There should be all of the matches from the non-overlap processor plus those 
from the overlap.  Decreasing from 215 to 211 is strange.  Have you done any 
manual spot checks on this?  It is really bizarre that you'd only have two 
matches per document (100 docs?).  

Thanks,
Sean

-Original Message-
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Friday, December 19, 2014 3:23 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Sean,

I tried the configuration changes you mentioned in your earlier email.

The results are as follows:

Total Annotations found: 12,161 (default configuration found 8,284)

If counting exact span matches, this run only matched 211 (default 
configuration matched 215).

If counting overlapping spans, this run only matched 220 (default configuration 
matched 224)

Bruce



 [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen Senior 
Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei pei.c...@childrens.harvard.edu
wrote:

  Kim,

 Maintenance is the factor not bugs/issue to forge ahead.

 They are 2 components that do the same thing with the same goal (As 
 Sean mentioned, one should be able configure the new code base to  
 replicate the old algorithm if required- it’s just a simpler and 
 cleaner code base.  If this is not the case or if there are issues, we 
 should fix it and move forward.).

 We can keep the old component around for as long as needed, but it’s 
 likely going to have limited support…

 --Pei



 *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com]
 *Sent:* Friday, December 19, 2014 1:47 PM
 *To:* Chen, Pei; dev@ctakes.apache.org

 *Subject:* Re: cTakes Annotation Comparison



 Pei,

 I don't think bugs/issues should be part of determining if one 
 algorithm vs the other is superior. Obviously, it is worth mentioning 
 the bugs, but if the fast lookup method has worse precision and recall 
 but better performance, vs the slower but more accurate first word 
 lookup algorithm, then time should be invested in fixing those bugs 
 and resolving those weird issues.

 Now I'm not saying which one is superior in this case, as the data 
 will end up speaking for itself one way or the other; bus as of right 
 now, I'm not convinced yet that the old dictionary lookup is obsolete 
 yet, and I'm not sure the community is convinced yet either.



 [image: IMAT Solutions] http://imatsolutions.com

 *Kim Ebert*
 Software Engineer
 [image: Office:]801.669.7342
 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com

 On 12/19/2014 08:39 AM, Chen, Pei wrote:

 Also check out stats that Sean ran before releasing the new component on:


 http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-
 fast/doc/DictionaryLookupStats.docx

 From the evaluation and experience, the new lookup algorithm should be 
 a huge improvement in terms of both speed and accuracy.

 This is very different than what Bruce mentioned…  I’m sure Sean will 
 chime here.

 (The old dictionary lookup is essentially obsolete now- plagued with 
 bugs/issues as you mentioned.)

 --Pei



 *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com
 kim.eb...@perfectsearchcorp.com]
 *Sent:* Friday, December 19, 2014 10:25 AM
 *To:* dev@ctakes.apache.org
 *Subject:* Re: cTakes Annotation Comparison



 Guergana,

 I'm curious to the number of records that are in your gold standard 
 sets, or if your gold standard set was run through a long running cTAKES 
 process.
 I know at some point we fixed a bug in the old dictionary lookup that 
 caused the permutations to become corrupted over time. Typically this 
 isn't seen in the first few records, but over time as patterns are 
 used the permutations would become corrupted. This caused documents 
 that were fed through cTAKES more than once to have less codes 
 returned than the first time.

 For example, if a permutation of 4,2,3,1 was found, the permutation 
 would be corrupted to be 1,2,3,4. It would no longer be possible to 
 detect permutations of 4,2,3,1 until cTAKES was restarted. We got the 
 fix in after the cTAKES 3.2.0 release. 
 https://issues.apache.org/jira/browse/CTAKES-310
 Depending upon the corpus size, I could see the permutation engine 
 eventually only have a single permutation of 1,2,3,4.

 Typically though, this isn't very easily detected in the first 100 or 
 so documents.

 We discovered this issue when we made cTAKES have consistent output of 
 codes in our system.



 [image: IMAT Solutions] http://imatsolutions.com

 *Kim Ebert*
 Software Engineer
 [image: Office:]801.669.7342
 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com

 On 12/19/2014 07:05 AM, Savova, Guergana wrote:

 We are doing a similar kind of evaluation and will report the results.



 Before we released the Fast lookup, we did

Re: cTakes Annotation Comparison

My original results were using a newly downloaded cTakes 3.2.1 with the
separately downloaded resources copied in. There were no changes to any of
the configuration files.

As far as this last run, I modified the UMLSLookupAnnotator.xml and
AggregatePlaintextFastUMLSProcessor.xml.  I've attached the modified ones I
used (but they may not get through the mailing list).



 [image: IMAT Solutions] http://imatsolutions.com
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean 
sean.fi...@childrens.harvard.edu wrote:

 Hi Bruce,

 I'm not sure how there would be fewer matches with the overlap processor.
 There should be all of the matches from the non-overlap processor plus
 those from the overlap.  Decreasing from 215 to 211 is strange.  Have you
 done any manual spot checks on this?  It is really bizarre that you'd only
 have two matches per document (100 docs?).

 Thanks,
 Sean

 -Original Message-
 From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 Sent: Friday, December 19, 2014 3:23 PM
 To: dev@ctakes.apache.org
 Subject: Re: cTakes Annotation Comparison

 Sean,

 I tried the configuration changes you mentioned in your earlier email.

 The results are as follows:

 Total Annotations found: 12,161 (default configuration found 8,284)

 If counting exact span matches, this run only matched 211 (default
 configuration matched 215).

 If counting overlapping spans, this run only matched 220 (default
 configuration matched 224)

 Bruce



  [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen Senior
 Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei 
 pei.c...@childrens.harvard.edu
 wrote:
 
   Kim,
 
  Maintenance is the factor not bugs/issue to forge ahead.
 
  They are 2 components that do the same thing with the same goal (As
  Sean mentioned, one should be able configure the new code base to
  replicate the old algorithm if required- it’s just a simpler and
  cleaner code base.  If this is not the case or if there are issues, we
  should fix it and move forward.).
 
  We can keep the old component around for as long as needed, but it’s
  likely going to have limited support…
 
  --Pei
 
 
 
  *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com]
  *Sent:* Friday, December 19, 2014 1:47 PM
  *To:* Chen, Pei; dev@ctakes.apache.org
 
  *Subject:* Re: cTakes Annotation Comparison
 
 
 
  Pei,
 
  I don't think bugs/issues should be part of determining if one
  algorithm vs the other is superior. Obviously, it is worth mentioning
  the bugs, but if the fast lookup method has worse precision and recall
  but better performance, vs the slower but more accurate first word
  lookup algorithm, then time should be invested in fixing those bugs
  and resolving those weird issues.
 
  Now I'm not saying which one is superior in this case, as the data
  will end up speaking for itself one way or the other; bus as of right
  now, I'm not convinced yet that the old dictionary lookup is obsolete
  yet, and I'm not sure the community is convinced yet either.
 
 
 
  [image: IMAT Solutions] http://imatsolutions.com
 
  *Kim Ebert*
  Software Engineer
  [image: Office:]801.669.7342
  kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com
 
  On 12/19/2014 08:39 AM, Chen, Pei wrote:
 
  Also check out stats that Sean ran before releasing the new component on:
 
 
  http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-
  fast/doc/DictionaryLookupStats.docx
 
  From the evaluation and experience, the new lookup algorithm should be
  a huge improvement in terms of both speed and accuracy.
 
  This is very different than what Bruce mentioned…  I’m sure Sean will
  chime here.
 
  (The old dictionary lookup is essentially obsolete now- plagued with
  bugs/issues as you mentioned.)
 
  --Pei
 
 
 
  *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com
  kim.eb...@perfectsearchcorp.com]
  *Sent:* Friday, December 19, 2014 10:25 AM
  *To:* dev@ctakes.apache.org
  *Subject:* Re: cTakes Annotation Comparison
 
 
 
  Guergana,
 
  I'm curious to the number of records that are in your gold standard
  sets, or if your gold standard set was run through a long running cTAKES
 process.
  I know at some point we fixed a bug in the old dictionary lookup that
  caused the permutations to become corrupted over time. Typically this
  isn't seen in the first few records, but over time as patterns are
  used the permutations would become corrupted. This caused documents
  that were fed through cTAKES more than once to have less codes
  returned than the first time.
 
  For example, if a permutation of 4,2,3,1 was found, the permutation
  would be corrupted to be 1,2,3,4. It would no longer be possible to
  detect permutations of 4,2,3,1 until cTAKES was restarted. We got the
  fix in after the cTAKES 3.2.0 release

RE: cTakes Annotation Comparison

Hi Bruce,
 Correction -- So far, I did steps 1 and 2 of Sean's email.

No problem.  Aside from recreating the database, those two steps have the 
greatest impact.  But before you change anything else, please do some manual 
spot checks.  I have never seen a case where the lookup would be so horribly 
inaccurate.

Thanks

-Original Message-
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Friday, December 19, 2014 3:29 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Correction -- So far, I did steps 1 and 2 of Sean's email.


 [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen Senior 
Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 1:22 PM, Bruce Tietjen  
bruce.tiet...@perfectsearchcorp.com wrote:

 Sean,

 I tried the configuration changes you mentioned in your earlier email.

 The results are as follows:

 Total Annotations found: 12,161 (default configuration found 8,284)

 If counting exact span matches, this run only matched 211 (default 
 configuration matched 215).

 If counting overlapping spans, this run only matched 220 (default 
 configuration matched 224)

 Bruce



  [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen 
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei  
 pei.c...@childrens.harvard.edu wrote:

  Kim,

 Maintenance is the factor not bugs/issue to forge ahead.

 They are 2 components that do the same thing with the same goal (As 
 Sean mentioned, one should be able configure the new code base to  
 replicate the old algorithm if required- it’s just a simpler and 
 cleaner code base.  If this is not the case or if there are issues, 
 we should fix it and move forward.).

 We can keep the old component around for as long as needed, but it’s 
 likely going to have limited support…

 --Pei



 *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com]
 *Sent:* Friday, December 19, 2014 1:47 PM
 *To:* Chen, Pei; dev@ctakes.apache.org

 *Subject:* Re: cTakes Annotation Comparison



 Pei,

 I don't think bugs/issues should be part of determining if one 
 algorithm vs the other is superior. Obviously, it is worth mentioning 
 the bugs, but if the fast lookup method has worse precision and 
 recall but better performance, vs the slower but more accurate first 
 word lookup algorithm, then time should be invested in fixing those 
 bugs and resolving those weird issues.

 Now I'm not saying which one is superior in this case, as the data 
 will end up speaking for itself one way or the other; bus as of right 
 now, I'm not convinced yet that the old dictionary lookup is obsolete 
 yet, and I'm not sure the community is convinced yet either.



 [image: IMAT Solutions] http://imatsolutions.com

 *Kim Ebert*
 Software Engineer
 [image: Office:]801.669.7342
 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com

 On 12/19/2014 08:39 AM, Chen, Pei wrote:

 Also check out stats that Sean ran before releasing the new component on:


 http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup
 -fast/doc/DictionaryLookupStats.docx

 From the evaluation and experience, the new lookup algorithm should 
 be a huge improvement in terms of both speed and accuracy.

 This is very different than what Bruce mentioned…  I’m sure Sean will 
 chime here.

 (The old dictionary lookup is essentially obsolete now- plagued with 
 bugs/issues as you mentioned.)

 --Pei



 *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com
 kim.eb...@perfectsearchcorp.com]
 *Sent:* Friday, December 19, 2014 10:25 AM
 *To:* dev@ctakes.apache.org
 *Subject:* Re: cTakes Annotation Comparison



 Guergana,

 I'm curious to the number of records that are in your gold standard 
 sets, or if your gold standard set was run through a long running cTAKES 
 process.
 I know at some point we fixed a bug in the old dictionary lookup that 
 caused the permutations to become corrupted over time. Typically this 
 isn't seen in the first few records, but over time as patterns are 
 used the permutations would become corrupted. This caused documents 
 that were fed through cTAKES more than once to have less codes 
 returned than the first time.

 For example, if a permutation of 4,2,3,1 was found, the permutation 
 would be corrupted to be 1,2,3,4. It would no longer be possible to 
 detect permutations of 4,2,3,1 until cTAKES was restarted. We got the 
 fix in after the cTAKES 3.2.0 release.
 https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the 
 corpus size, I could see the permutation engine eventually only have 
 a single permutation of 1,2,3,4.

 Typically though, this isn't very easily detected in the first 100 or 
 so documents.

 We discovered this issue when we made cTAKES have consistent output 
 of codes in our system.



 [image: IMAT Solutions] http://imatsolutions.com

 *Kim Ebert

Re: cTakes Annotation Comparison

I'll do that -- there is always a possibility of bugs in the analysis tool.



 [image: IMAT Solutions] http://imatsolutions.com
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean 
sean.fi...@childrens.harvard.edu wrote:

  Sorry, I meant “Do some spot checks on the validity”.  In other words,
 when your script reports that a cui and/or span is missing, manually look
 at the data and see if it really is.  Just open up one .xmi in the CVD and
 see what it looks like.



 Thanks,

 Sean



 *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 *Sent:* Friday, December 19, 2014 3:37 PM
 *To:* dev@ctakes.apache.org
 *Subject:* Re: cTakes Annotation Comparison



 My original results were using a newly downloaded cTakes 3.2.1 with the
 separately downloaded resources copied in. There were no changes to any of
 the configuration files.

 As far as this last run, I modified the UMLSLookupAnnotator.xml and
 AggregatePlaintextFastUMLSProcessor.xml.  I've attached the modified ones I
 used (but they may not get through the mailing list).






 [image: Image removed by sender. IMAT Solutions]
 http://imatsolutions.com

 *Bruce Tietjen*
 Senior Software Engineer
 [image: Image removed by sender. Mobile:]801.634.1547
 bruce.tiet...@imatsolutions.com



 On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean 
 sean.fi...@childrens.harvard.edu wrote:

 Hi Bruce,

 I'm not sure how there would be fewer matches with the overlap processor.
 There should be all of the matches from the non-overlap processor plus
 those from the overlap.  Decreasing from 215 to 211 is strange.  Have you
 done any manual spot checks on this?  It is really bizarre that you'd only
 have two matches per document (100 docs?).

 Thanks,
 Sean

 -Original Message-
 From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 Sent: Friday, December 19, 2014 3:23 PM
 To: dev@ctakes.apache.org
 Subject: Re: cTakes Annotation Comparison

 Sean,

 I tried the configuration changes you mentioned in your earlier email.

 The results are as follows:

 Total Annotations found: 12,161 (default configuration found 8,284)

 If counting exact span matches, this run only matched 211 (default
 configuration matched 215).

 If counting overlapping spans, this run only matched 220 (default
 configuration matched 224)

 Bruce



  [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen Senior
 Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei 
 pei.c...@childrens.harvard.edu
 wrote:
 
   Kim,
 
  Maintenance is the factor not bugs/issue to forge ahead.
 
  They are 2 components that do the same thing with the same goal (As
  Sean mentioned, one should be able configure the new code base to
  replicate the old algorithm if required- it’s just a simpler and
  cleaner code base.  If this is not the case or if there are issues, we
  should fix it and move forward.).
 
  We can keep the old component around for as long as needed, but it’s
  likely going to have limited support…
 
  --Pei
 
 
 
  *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com]
  *Sent:* Friday, December 19, 2014 1:47 PM
  *To:* Chen, Pei; dev@ctakes.apache.org
 
  *Subject:* Re: cTakes Annotation Comparison
 
 
 
  Pei,
 
  I don't think bugs/issues should be part of determining if one
  algorithm vs the other is superior. Obviously, it is worth mentioning
  the bugs, but if the fast lookup method has worse precision and recall
  but better performance, vs the slower but more accurate first word
  lookup algorithm, then time should be invested in fixing those bugs
  and resolving those weird issues.
 
  Now I'm not saying which one is superior in this case, as the data
  will end up speaking for itself one way or the other; bus as of right
  now, I'm not convinced yet that the old dictionary lookup is obsolete
  yet, and I'm not sure the community is convinced yet either.
 
 
 
  [image: IMAT Solutions] http://imatsolutions.com
 
  *Kim Ebert*
  Software Engineer
  [image: Office:]801.669.7342
  kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com
 
  On 12/19/2014 08:39 AM, Chen, Pei wrote:
 
  Also check out stats that Sean ran before releasing the new component on:
 
 
  http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-
  fast/doc/DictionaryLookupStats.docx
 
  From the evaluation and experience, the new lookup algorithm should be
  a huge improvement in terms of both speed and accuracy.
 
  This is very different than what Bruce mentioned…  I’m sure Sean will
  chime here.
 
  (The old dictionary lookup is essentially obsolete now- plagued with
  bugs/issues as you mentioned.)
 
  --Pei
 
 
 
  *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com
  kim.eb...@perfectsearchcorp.com]
  *Sent:* Friday, December 19, 2014 10:25 AM
  *To:* dev@ctakes.apache.org
  *Subject

Re: cTakes Annotation Comparison

My apologies to Sean and everyone,

I am happy to report that I found a bug in our analysis tools that was
missing the last FSArray entry for any FSArray list.

With the bug fixed, the results look MUCH better.

UMLSProcessor found 31,598 annotations
FastUMLSProcessor found 30,716 annotations

There were 23,522 annotations that were exact matches between the two.

When comparing with the gold standard annotations (4591 annotations):

UMLSProcessor found 2632 matches (2,735 including overlaps)
FastUMLSProcessor found 2795 matches (2,842 including overlaps)






 [image: IMAT Solutions] http://imatsolutions.com
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen 
bruce.tiet...@perfectsearchcorp.com wrote:

 I'll do that -- there is always a possibility of bugs in the analysis
 tool.


  [image: IMAT Solutions] http://imatsolutions.com
  Bruce Tietjen
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean 
 sean.fi...@childrens.harvard.edu wrote:

  Sorry, I meant “Do some spot checks on the validity”.  In other words,
 when your script reports that a cui and/or span is missing, manually look
 at the data and see if it really is.  Just open up one .xmi in the CVD and
 see what it looks like.



 Thanks,

 Sean



 *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 *Sent:* Friday, December 19, 2014 3:37 PM
 *To:* dev@ctakes.apache.org
 *Subject:* Re: cTakes Annotation Comparison



 My original results were using a newly downloaded cTakes 3.2.1 with the
 separately downloaded resources copied in. There were no changes to any of
 the configuration files.

 As far as this last run, I modified the UMLSLookupAnnotator.xml and
 AggregatePlaintextFastUMLSProcessor.xml.  I've attached the modified ones I
 used (but they may not get through the mailing list).






 [image: Image removed by sender. IMAT Solutions]
 http://imatsolutions.com

 *Bruce Tietjen*
 Senior Software Engineer
 [image: Image removed by sender. Mobile:]801.634.1547
 bruce.tiet...@imatsolutions.com



 On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean 
 sean.fi...@childrens.harvard.edu wrote:

 Hi Bruce,

 I'm not sure how there would be fewer matches with the overlap
 processor.  There should be all of the matches from the non-overlap
 processor plus those from the overlap.  Decreasing from 215 to 211 is
 strange.  Have you done any manual spot checks on this?  It is really
 bizarre that you'd only have two matches per document (100 docs?).

 Thanks,
 Sean

 -Original Message-
 From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 Sent: Friday, December 19, 2014 3:23 PM
 To: dev@ctakes.apache.org
 Subject: Re: cTakes Annotation Comparison

 Sean,

 I tried the configuration changes you mentioned in your earlier email.

 The results are as follows:

 Total Annotations found: 12,161 (default configuration found 8,284)

 If counting exact span matches, this run only matched 211 (default
 configuration matched 215).

 If counting overlapping spans, this run only matched 220 (default
 configuration matched 224)

 Bruce



  [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei 
 pei.c...@childrens.harvard.edu
 wrote:
 
   Kim,
 
  Maintenance is the factor not bugs/issue to forge ahead.
 
  They are 2 components that do the same thing with the same goal (As
  Sean mentioned, one should be able configure the new code base to
  replicate the old algorithm if required- it’s just a simpler and
  cleaner code base.  If this is not the case or if there are issues, we
  should fix it and move forward.).
 
  We can keep the old component around for as long as needed, but it’s
  likely going to have limited support…
 
  --Pei
 
 
 
  *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com]
  *Sent:* Friday, December 19, 2014 1:47 PM
  *To:* Chen, Pei; dev@ctakes.apache.org
 
  *Subject:* Re: cTakes Annotation Comparison
 
 
 
  Pei,
 
  I don't think bugs/issues should be part of determining if one
  algorithm vs the other is superior. Obviously, it is worth mentioning
  the bugs, but if the fast lookup method has worse precision and recall
  but better performance, vs the slower but more accurate first word
  lookup algorithm, then time should be invested in fixing those bugs
  and resolving those weird issues.
 
  Now I'm not saying which one is superior in this case, as the data
  will end up speaking for itself one way or the other; bus as of right
  now, I'm not convinced yet that the old dictionary lookup is obsolete
  yet, and I'm not sure the community is convinced yet either.
 
 
 
  [image: IMAT Solutions] http://imatsolutions.com
 
  *Kim Ebert*
  Software Engineer
  [image: Office:]801.669.7342

RE: cTakes Annotation Comparison --- (^:

Apologies accepted.  I'm really glad that you found the problem.

So what you are saying is (just to be very very clear to everybody reading this 
thread):

FastUMLSProcessor found 2795 matches (2,842 including overlaps)
While
 UMLSProcessor found 2632 matches (2,735 including overlaps)

--- So recall is BETTER in the fast lookup

And...
FastUMLSProcessor found 30,716 annotations
While
UMLSProcessor found 31,598 annotations

--- So precision is also looking BETTER in the fast lookup

Now maybe there will be a little more buy-in for the fast lookup.

Cheers,
Sean


-Original Message-
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Friday, December 19, 2014 5:05 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

My apologies to Sean and everyone,

I am happy to report that I found a bug in our analysis tools that was missing 
the last FSArray entry for any FSArray list.

With the bug fixed, the results look MUCH better.

UMLSProcessor found 31,598 annotations
FastUMLSProcessor found 30,716 annotations

There were 23,522 annotations that were exact matches between the two.

When comparing with the gold standard annotations (4591 annotations):

UMLSProcessor found 2632 matches (2,735 including overlaps) FastUMLSProcessor 
found 2795 matches (2,842 including overlaps)






 [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen Senior 
Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen  
bruce.tiet...@perfectsearchcorp.com wrote:

 I'll do that -- there is always a possibility of bugs in the analysis 
 tool.


  [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen 
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean  
 sean.fi...@childrens.harvard.edu wrote:

  Sorry, I meant “Do some spot checks on the validity”.  In other 
 words, when your script reports that a cui and/or span is missing, 
 manually look at the data and see if it really is.  Just open up one 
 .xmi in the CVD and see what it looks like.



 Thanks,

 Sean



 *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 *Sent:* Friday, December 19, 2014 3:37 PM
 *To:* dev@ctakes.apache.org
 *Subject:* Re: cTakes Annotation Comparison



 My original results were using a newly downloaded cTakes 3.2.1 with 
 the separately downloaded resources copied in. There were no changes 
 to any of the configuration files.

 As far as this last run, I modified the UMLSLookupAnnotator.xml and 
 AggregatePlaintextFastUMLSProcessor.xml.  I've attached the modified 
 ones I used (but they may not get through the mailing list).






 [image: Image removed by sender. IMAT Solutions] 
 http://imatsolutions.com

 *Bruce Tietjen*
 Senior Software Engineer
 [image: Image removed by sender. Mobile:]801.634.1547 
 bruce.tiet...@imatsolutions.com



 On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean  
 sean.fi...@childrens.harvard.edu wrote:

 Hi Bruce,

 I'm not sure how there would be fewer matches with the overlap 
 processor.  There should be all of the matches from the non-overlap 
 processor plus those from the overlap.  Decreasing from 215 to 211 is 
 strange.  Have you done any manual spot checks on this?  It is really 
 bizarre that you'd only have two matches per document (100 docs?).

 Thanks,
 Sean

 -Original Message-
 From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 Sent: Friday, December 19, 2014 3:23 PM
 To: dev@ctakes.apache.org
 Subject: Re: cTakes Annotation Comparison

 Sean,

 I tried the configuration changes you mentioned in your earlier email.

 The results are as follows:

 Total Annotations found: 12,161 (default configuration found 8,284)

 If counting exact span matches, this run only matched 211 (default 
 configuration matched 215).

 If counting overlapping spans, this run only matched 220 (default 
 configuration matched 224)

 Bruce



  [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen 
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei  
 pei.c...@childrens.harvard.edu
 wrote:
 
   Kim,
 
  Maintenance is the factor not bugs/issue to forge ahead.
 
  They are 2 components that do the same thing with the same goal (As 
  Sean mentioned, one should be able configure the new code base to 
  replicate the old algorithm if required- it’s just a simpler and 
  cleaner code base.  If this is not the case or if there are issues, 
  we should fix it and move forward.).
 
  We can keep the old component around for as long as needed, but 
  it’s likely going to have limited support…
 
  --Pei
 
 
 
  *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com]
  *Sent:* Friday, December 19, 2014 1:47 PM
  *To:* Chen, Pei; dev@ctakes.apache.org
 
  *Subject:* Re: cTakes Annotation Comparison

Re: cTakes Annotation Comparison

Bruce,

I think we all feel a lot better now. I think the tool will be helpful
moving forward.

I've updated the git repo with the fix in case anyone is interested.

IMAT Solutions http://imatsolutions.com
Kim Ebert
Software Engineer
Office: 801.669.7342
kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com
On 12/19/2014 03:04 PM, Bruce Tietjen wrote:
 My apologies to Sean and everyone,

 I am happy to report that I found a bug in our analysis tools that was
 missing the last FSArray entry for any FSArray list.

 With the bug fixed, the results look MUCH better.

 UMLSProcessor found 31,598 annotations
 FastUMLSProcessor found 30,716 annotations

 There were 23,522 annotations that were exact matches between the two.

 When comparing with the gold standard annotations (4591 annotations):

 UMLSProcessor found 2632 matches (2,735 including overlaps)
 FastUMLSProcessor found 2795 matches (2,842 including overlaps)






  [image: IMAT Solutions] http://imatsolutions.com
  Bruce Tietjen
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen 
 bruce.tiet...@perfectsearchcorp.com wrote:
 I'll do that -- there is always a possibility of bugs in the analysis
 tool.


  [image: IMAT Solutions] http://imatsolutions.com
  Bruce Tietjen
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean 
 sean.fi...@childrens.harvard.edu wrote:
  Sorry, I meant “Do some spot checks on the validity”.  In other words,
 when your script reports that a cui and/or span is missing, manually look
 at the data and see if it really is.  Just open up one .xmi in the CVD and
 see what it looks like.



 Thanks,

 Sean



 *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 *Sent:* Friday, December 19, 2014 3:37 PM
 *To:* dev@ctakes.apache.org
 *Subject:* Re: cTakes Annotation Comparison



 My original results were using a newly downloaded cTakes 3.2.1 with the
 separately downloaded resources copied in. There were no changes to any of
 the configuration files.

 As far as this last run, I modified the UMLSLookupAnnotator.xml and
 AggregatePlaintextFastUMLSProcessor.xml.  I've attached the modified ones I
 used (but they may not get through the mailing list).






 [image: Image removed by sender. IMAT Solutions]
 http://imatsolutions.com

 *Bruce Tietjen*
 Senior Software Engineer
 [image: Image removed by sender. Mobile:]801.634.1547
 bruce.tiet...@imatsolutions.com



 On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean 
 sean.fi...@childrens.harvard.edu wrote:

 Hi Bruce,

 I'm not sure how there would be fewer matches with the overlap
 processor.  There should be all of the matches from the non-overlap
 processor plus those from the overlap.  Decreasing from 215 to 211 is
 strange.  Have you done any manual spot checks on this?  It is really
 bizarre that you'd only have two matches per document (100 docs?).

 Thanks,
 Sean

 -Original Message-
 From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
 Sent: Friday, December 19, 2014 3:23 PM
 To: dev@ctakes.apache.org
 Subject: Re: cTakes Annotation Comparison

 Sean,

 I tried the configuration changes you mentioned in your earlier email.

 The results are as follows:

 Total Annotations found: 12,161 (default configuration found 8,284)

 If counting exact span matches, this run only matched 211 (default
 configuration matched 215).

 If counting overlapping spans, this run only matched 220 (default
 configuration matched 224)

 Bruce



  [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei 
 pei.c...@childrens.harvard.edu
 wrote:
  Kim,

 Maintenance is the factor not bugs/issue to forge ahead.

 They are 2 components that do the same thing with the same goal (As
 Sean mentioned, one should be able configure the new code base to
 replicate the old algorithm if required- it’s just a simpler and
 cleaner code base.  If this is not the case or if there are issues, we
 should fix it and move forward.).

 We can keep the old component around for as long as needed, but it’s
 likely going to have limited support…

 --Pei



 *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com]
 *Sent:* Friday, December 19, 2014 1:47 PM
 *To:* Chen, Pei; dev@ctakes.apache.org

 *Subject:* Re: cTakes Annotation Comparison



 Pei,

 I don't think bugs/issues should be part of determining if one
 algorithm vs the other is superior. Obviously, it is worth mentioning
 the bugs, but if the fast lookup method has worse precision and recall
 but better performance, vs the slower but more accurate first word
 lookup algorithm, then time should be invested in fixing those bugs
 and resolving those weird issues.

 Now I'm not saying which one is superior in this case

Re: cTakes Annotation Comparison