RE: dictionary lookup config for best F1 measure [was RE: cTakes Annotation Comparison
Hi James, Great question. In truth, you may need to run a few times to find out. Doing that with a full pipeline would be tedious, but there is a descriptor in clinical-pipeline named CuisOnlyPlaintextUMLSProcessor.xml that will only obtain Umls cuis. It runs ~50,000 notes per hour on my laptop as-is, so I suggest that you test with that ae. It has lvg commented out by default (for speed). Adding lvg will increase the runtime, but it also will (as you know) find a few additional terms. You can try a few configurations without it and then the best option with it. If you want to test the default dictionary lookup then you can certainly swap the referenced lookup xmls. Changes to the fast dictionary configuration are made in two places: 1. The main descriptor ...-fast/desc/analysis_engine/UmlsLookupAnnotator.xml 2. The resource (dictionary) configuration file resources/.../fast/cTakesHsql..xml A few suggestions, in order of impact: 1. I am guessing that the annotations in clef are human annotated with longest-length spans only. In other words, colon cancer instead of colon cancer and cancer. To best approximate this style of annotation, edit the cTakesHsql.xml in the section rareWordConsumer and change the selected implementation. By default it is DefaultTermConsumer (go figure), but you will want to use the commented-out PrecisionTermConsumer. As the above cTakesHsql comment indicates DefaultTermConsumer will persist all spans. PrecisionTermConsumer will only persist only the longest overlapping span of any semantic group. Doing this should increase precision, and depending upon how good the annotations are it should not greatly change recall. 2. Just for kicks, try using SemanticCleanupTermConsumer. It may slightly increase precision, but it also may decrease recall. Hopefully it doesn't do much at all (PrecisionTermConsumer and proper semantic typing in the dictionary should suffice without this term consumer). 3. Especially for task 2 (acronyms abbreviations), you should try a run with nameminimumSpan/name in UmlsLookupAnnotator.xml set to 2. This changes the minimum allowable span of a term. The default is 3 to increase precision on acronyms abbreviations, but decreasing to 2 may improve recall on the same. The dictionary is not built with anything below 2 characters. 4. On that note (character length), if task 1 does not include acronyms abbreviations, then you can try increasing the minimum span length above 3 and see if there is a good increase in precision without a significant decrease in recall. 5. Try a few runs with overlapping spans in addition to exact matches. To do this use the OverlapJCasTermAnnotator instead of the DefaultJCasTermAnnotator annotator implementation. DefaultJCasTermAnnotator is specified in UmlsLookupAnnotator.xml but I will check in a descriptor for overlap matching. There are additional parameters for that option, but I'll email them after I checkin. 6. By default the new lookup uses Sentence as the lookup window. I did this for two reasons: 1. Not all terms are within Noun Phrases, 2. Some Noun Phrases overlapped, causing repeated lookups (in my 3.0 candidate trials), and 3. Not all cTakes Noun Phrases are accurate. Because the lookup is fast, using a full Sentence for lookup doesn't seem to hurt much. However, you can always switch it back to see if precision is increased enough to warrant the decrease in recall. This is changed in UmlsLookupAnnotator.xml I have run my own tests with the various setups, but I don't want to adversely influence what you run just in case the trends with the share/clef annotations differ. Sean -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Friday, January 09, 2015 3:57 PM To: 'dev@ctakes.apache.org' Subject: dictionary lookup config for best F1 measure [was RE: cTakes Annotation Comparison Sean (or others), Of the various configuration options described below, which values/choices would you recommend for best F1 measure for something like the shared clef 2013 task? https://sites.google.com/site/shareclefehealth/ I'm looking for something that doesn't have to be the best speed-wise, but that is the recommended for optimizing F1 measure. Regards, James -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Friday, December 19, 2014 11:55 AM To: dev@ctakes.apache.org; kim.eb...@imatsolutions.com Subject: RE: cTakes Annotation Comparison Well, I guess that it is time for me to speak up … I must say that I’m happy that people are showing interest in the fast lookup. I am also happy (sort of) that some concerns are being raised – and that there is now community participation in my little toy. I have some concerns about what people are reporting. This does not coincide with what I have seen at all. Yesterday I started (without knowing this thread existed
dictionary lookup config for best F1 measure [was RE: cTakes Annotation Comparison
Sean (or others), Of the various configuration options described below, which values/choices would you recommend for best F1 measure for something like the shared clef 2013 task? https://sites.google.com/site/shareclefehealth/ I'm looking for something that doesn't have to be the best speed-wise, but that is the recommended for optimizing F1 measure. Regards, James -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Friday, December 19, 2014 11:55 AM To: dev@ctakes.apache.org; kim.eb...@imatsolutions.com Subject: RE: cTakes Annotation Comparison Well, I guess that it is time for me to speak up … I must say that I’m happy that people are showing interest in the fast lookup. I am also happy (sort of) that some concerns are being raised – and that there is now community participation in my little toy. I have some concerns about what people are reporting. This does not coincide with what I have seen at all. Yesterday I started (without knowing this thread existed) testing a bare-minimum pipeline for CUI extraction. It is just the stripped-down Aggregate with only: segment, tokens, sentences, POS, and the fast lookup. The people at Children’s wanted to know how fast we could get. 1,196 notes in under 90 seconds on my laptop with over 210,000 annotations, which is 175/note. After reading the thread I decided to run the fast lookup with several configurations. I also ran the default for 10.5 hours. I am comparing the annotations from each system against the human annotations that we have, and I will let everybody know what I find – for better or worse. The fast lookup does not (out-of-box) do the exact same thing as the default. Some things can be configured to make it more closely approximate the default dictionary. 1.Set the minimum annotation span length to 2 (default is 3). This is in desc/[ae]/UmlsLookupAnnotator.xml : line #78. The annotator should then pick up text like “CT” and improve recall, but it will hurt precision. 2. Set the Lookup Window to LookupWindowAnnotation. This is in desc/[ae]/UmlsLookupAnnotator.xml: lines #65 #93. The LookupWindowAnnotator will need to be added to the aggregate pipeline AggregatePlaintextFastUMLSProcesor.xml lines #50 #172. This will narrow the lookup window and may increase precision, but (in my experience) reduces recall. 3. Allow the –rough- identification of Overlapping spans. The default dictionary will often identify text like “metastatic colorectal carcinoma” when that text actually does not exist anywhere in umls. It basically ignores “colorectal” and gives the whole span the CUI for “metastatic carcinoma”. In this case it is arguably a good thing. In many others it is arguably not so much. There is a Class ... lookup2.ae.OverlapJCasTermAnnotator.java that will do the same thing. You can create a new desc/[ae]/*Annotator.xml or just change the annotatorImplementationName in desc/[ae]/UmlsLookupAnnotator.xml line #25. I will check in a new desc xml (sorry; thought I had) because there are 2 parameters unique to OverlapJCasTermAnnotator 4. You can play with the OverlapJCasTermAnnotator parameters “consecutiveSkips” and “totalTokenSkips”. These control just how lenient you want the overlap tagging to be. 5. Create a new dictionary database. There is a (bit messy) DictionaryTool in sandbox that will let you dump whatever you do or do not want from UMLS into a database. It will also help you clean up or –select- stored entries as well. There is a lot of garbage in the default dictionary database: repeated terms with caps/no caps (“Cancer”,”cancer”), text with metadata (“cancer [finding]”) and text that just clutters (“PhenX: entry for cancer”, “1”, “2”). The fast lookup database should have most of the Snomed and RxNorm terms (and synonyms) of interest, but you could always make a new database that is much more inclusive. The main key to the speed of the fast dictionary lookup is actually … the key. It is the way that the database is indexed and the lookup by “rare” word instead of “first” word. Everything else can be changed around it and it should still be a faster version. As for the false positives like “Today”, that will always be a problem until we have disambiguation. The lookup is basically a glorified grep. Sean From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Friday, December 19, 2014 10:43 AM To: dev@ctakes.apache.org; kim.eb...@imatsolutions.com Subject: RE: cTakes Annotation Comparison Also check out stats that Sean ran before releasing the new component on: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx From the evaluation and experience, the new lookup algorithm should be a huge improvement in terms of both speed and accuracy. This is very different than what Bruce mentioned… I’m sure Sean will chime here
Re: cTakes Annotation Comparison
Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my small tests that I've done in a non-systematic way. Did you happen to capture the number of false positives yet (annotations made by cTAKES that are not in the human adjudicated standard)? I've seen a lot of dictionary hits that are not actually entity mentions, but I haven't had a chance to do a systematic analysis (we're working on our annotated gold standard now). One great example is the antibiotic Today. Every time the word today appears in any text it is annotated as a medication mention when it almost never is being used in that sense. These results by themselves are quite disappointing to me. Both the UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor recall. It seems like the trade off for more speed is a ten-fold (or more) decrease in entity recognition. Thanks again for sharing your results with us. I think they are very useful to the project. - Dave On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com wrote: Actually, we are working on a similar tool to compare it to the human adjudicated standard for the set we tested against. I didn't mention it before because the tool isn't complete yet, but initial results for the set (excluding those marked as CUI-less) was as follows: Human adjudicated annotations: 4591 (excluding CUI-less) Annotations found matching the human adjudicated standard UMLSProcessor 2245 FastUMLSProcessor 215 [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Bruce, Thanks for this-- very useful. Perhaps Sean Finan comment more- but it's also probably worth it to compare to an adjudicated human annotated gold standard. --Pei -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Thursday, December 18, 2014 1:45 PM To: dev@ctakes.apache.org Subject: cTakes Annotation Comparison With the recent release of cTakes 3.2.1, we were very interested in checking for any differences in annotations between using the AggregatePlaintextUMLSProcessor pipeline and the AggregatePlanetextFastUMLSProcessor pipeline within this release of cTakes with its associated set of UMLS resources. We chose to use the SHARE 14-a-b Training data that consists of 199 documents (Discharge 61, ECG 54, Echo 42 and Radiology 42) as the basis for the comparison. We decided to share a summary of the results with the development community. Documents Processed: 199 Processing Time: UMLSProcessor 2,439 seconds FastUMLSProcessor1,837 seconds Total Annotations Reported: UMLSProcessor 20,365 annotations FastUMLSProcessor 8,284 annotations Annotation Comparisons: Annotations common to both sets: 3,940 Annotations reported only by the UMLSProcessor: 16,425 Annotations reported only by the FastUMLSProcessor:4,344 If anyone is interested, following was our test procedure: We used the UIMA CPE to process the document set twice, once using the AggregatePlaintextUMLSProcessor pipeline and once using the AggregatePlaintextFastUMLSProcessor pipeline. We used the WriteCAStoFile CAS consumer to write the results to output files. We used a tool we recently developed to analyze and compare the annotations generated by the two pipelines. The tool compares the two outputs for each file and reports any differences in the annotations (MedicationMention, SignSymptomMention, ProcedureMention, AnatomicalSiteMention, and DiseaseDisorderMention) between the two output sets. The tool reports the number of 'matches' and 'misses' between each annotation set. A 'match' is defined as the presence of an identified source text interval with its associated CUI appearing in both annotation sets. A 'miss' is defined as the presence of an identified source text interval and its associated CUI in one annotation set, but no matching identified source text interval and CUI in the other. The tool also reports the total number of annotations (source text intervals with associated CUIs) reported in each annotation set. The compare tool is in our GitHub repository at https://github.com/perfectsearch/cTAKES-compare
RE: cTakes Annotation Comparison
We are doing a similar kind of evaluation and will report the results. Before we released the Fast lookup, we did a systematic evaluation across three gold standard sets. We did not see the trend that Bruce reported below. The P, R and F1 results from the old dictionary look up and the fast one were similar. Thank you everyone! --Guergana -Original Message- From: David Kincaid [mailto:kincaid.d...@gmail.com] Sent: Friday, December 19, 2014 9:02 AM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my small tests that I've done in a non-systematic way. Did you happen to capture the number of false positives yet (annotations made by cTAKES that are not in the human adjudicated standard)? I've seen a lot of dictionary hits that are not actually entity mentions, but I haven't had a chance to do a systematic analysis (we're working on our annotated gold standard now). One great example is the antibiotic Today. Every time the word today appears in any text it is annotated as a medication mention when it almost never is being used in that sense. These results by themselves are quite disappointing to me. Both the UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor recall. It seems like the trade off for more speed is a ten-fold (or more) decrease in entity recognition. Thanks again for sharing your results with us. I think they are very useful to the project. - Dave On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com wrote: Actually, we are working on a similar tool to compare it to the human adjudicated standard for the set we tested against. I didn't mention it before because the tool isn't complete yet, but initial results for the set (excluding those marked as CUI-less) was as follows: Human adjudicated annotations: 4591 (excluding CUI-less) Annotations found matching the human adjudicated standard UMLSProcessor 2245 FastUMLSProcessor 215 [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Bruce, Thanks for this-- very useful. Perhaps Sean Finan comment more- but it's also probably worth it to compare to an adjudicated human annotated gold standard. --Pei -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Thursday, December 18, 2014 1:45 PM To: dev@ctakes.apache.org Subject: cTakes Annotation Comparison With the recent release of cTakes 3.2.1, we were very interested in checking for any differences in annotations between using the AggregatePlaintextUMLSProcessor pipeline and the AggregatePlanetextFastUMLSProcessor pipeline within this release of cTakes with its associated set of UMLS resources. We chose to use the SHARE 14-a-b Training data that consists of 199 documents (Discharge 61, ECG 54, Echo 42 and Radiology 42) as the basis for the comparison. We decided to share a summary of the results with the development community. Documents Processed: 199 Processing Time: UMLSProcessor 2,439 seconds FastUMLSProcessor1,837 seconds Total Annotations Reported: UMLSProcessor 20,365 annotations FastUMLSProcessor 8,284 annotations Annotation Comparisons: Annotations common to both sets: 3,940 Annotations reported only by the UMLSProcessor: 16,425 Annotations reported only by the FastUMLSProcessor:4,344 If anyone is interested, following was our test procedure: We used the UIMA CPE to process the document set twice, once using the AggregatePlaintextUMLSProcessor pipeline and once using the AggregatePlaintextFastUMLSProcessor pipeline. We used the WriteCAStoFile CAS consumer to write the results to output files. We used a tool we recently developed to analyze and compare the annotations generated by the two pipelines. The tool compares the two outputs for each file and reports any differences in the annotations (MedicationMention, SignSymptomMention, ProcedureMention, AnatomicalSiteMention, and DiseaseDisorderMention) between the two output sets. The tool reports the number of 'matches' and 'misses' between each annotation set. A 'match' is defined as the presence of an identified source text interval with its associated CUI appearing in both annotation sets. A 'miss' is defined as the presence of an identified source text interval and its associated CUI in one annotation set, but no matching identified source text interval and CUI in the other. The tool also reports the total number of annotations (source text
Re: cTakes Annotation Comparison
Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the corpus size, I could see the permutation engine eventually only have a single permutation of 1,2,3,4. Typically though, this isn't very easily detected in the first 100 or so documents. We discovered this issue when we made cTAKES have consistent output of codes in our system. IMAT Solutions http://imatsolutions.com Kim Ebert Software Engineer Office: 801.669.7342 kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com On 12/19/2014 07:05 AM, Savova, Guergana wrote: We are doing a similar kind of evaluation and will report the results. Before we released the Fast lookup, we did a systematic evaluation across three gold standard sets. We did not see the trend that Bruce reported below. The P, R and F1 results from the old dictionary look up and the fast one were similar. Thank you everyone! --Guergana -Original Message- From: David Kincaid [mailto:kincaid.d...@gmail.com] Sent: Friday, December 19, 2014 9:02 AM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my small tests that I've done in a non-systematic way. Did you happen to capture the number of false positives yet (annotations made by cTAKES that are not in the human adjudicated standard)? I've seen a lot of dictionary hits that are not actually entity mentions, but I haven't had a chance to do a systematic analysis (we're working on our annotated gold standard now). One great example is the antibiotic Today. Every time the word today appears in any text it is annotated as a medication mention when it almost never is being used in that sense. These results by themselves are quite disappointing to me. Both the UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor recall. It seems like the trade off for more speed is a ten-fold (or more) decrease in entity recognition. Thanks again for sharing your results with us. I think they are very useful to the project. - Dave On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com wrote: Actually, we are working on a similar tool to compare it to the human adjudicated standard for the set we tested against. I didn't mention it before because the tool isn't complete yet, but initial results for the set (excluding those marked as CUI-less) was as follows: Human adjudicated annotations: 4591 (excluding CUI-less) Annotations found matching the human adjudicated standard UMLSProcessor 2245 FastUMLSProcessor 215 [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Bruce, Thanks for this-- very useful. Perhaps Sean Finan comment more- but it's also probably worth it to compare to an adjudicated human annotated gold standard. --Pei -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Thursday, December 18, 2014 1:45 PM To: dev@ctakes.apache.org Subject: cTakes Annotation Comparison With the recent release of cTakes 3.2.1, we were very interested in checking for any differences in annotations between using the AggregatePlaintextUMLSProcessor pipeline and the AggregatePlanetextFastUMLSProcessor pipeline within this release of cTakes with its associated set of UMLS resources. We chose to use the SHARE 14-a-b Training data that consists of 199 documents (Discharge 61, ECG 54, Echo 42 and Radiology 42) as the basis for the comparison. We decided to share a summary of the results with the development community. Documents Processed: 199 Processing Time: UMLSProcessor 2,439 seconds FastUMLSProcessor1,837 seconds Total Annotations Reported: UMLSProcessor 20,365 annotations FastUMLSProcessor 8,284 annotations Annotation Comparisons: Annotations common to both sets: 3,940 Annotations
RE: cTakes Annotation Comparison
Also check out stats that Sean ran before releasing the new component on: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx From the evaluation and experience, the new lookup algorithm should be a huge improvement in terms of both speed and accuracy. This is very different than what Bruce mentioned… I’m sure Sean will chime here. (The old dictionary lookup is essentially obsolete now- plagued with bugs/issues as you mentioned.) --Pei From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 10:25 AM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the corpus size, I could see the permutation engine eventually only have a single permutation of 1,2,3,4. Typically though, this isn't very easily detected in the first 100 or so documents. We discovered this issue when we made cTAKES have consistent output of codes in our system. [IMAT Solutions]http://imatsolutions.com Kim Ebert Software Engineer [Office:]801.669.7342 kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com On 12/19/2014 07:05 AM, Savova, Guergana wrote: We are doing a similar kind of evaluation and will report the results. Before we released the Fast lookup, we did a systematic evaluation across three gold standard sets. We did not see the trend that Bruce reported below. The P, R and F1 results from the old dictionary look up and the fast one were similar. Thank you everyone! --Guergana -Original Message- From: David Kincaid [mailto:kincaid.d...@gmail.com] Sent: Friday, December 19, 2014 9:02 AM To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my small tests that I've done in a non-systematic way. Did you happen to capture the number of false positives yet (annotations made by cTAKES that are not in the human adjudicated standard)? I've seen a lot of dictionary hits that are not actually entity mentions, but I haven't had a chance to do a systematic analysis (we're working on our annotated gold standard now). One great example is the antibiotic Today. Every time the word today appears in any text it is annotated as a medication mention when it almost never is being used in that sense. These results by themselves are quite disappointing to me. Both the UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor recall. It seems like the trade off for more speed is a ten-fold (or more) decrease in entity recognition. Thanks again for sharing your results with us. I think they are very useful to the project. - Dave On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.commailto:bruce.tiet...@perfectsearchcorp.com wrote: Actually, we are working on a similar tool to compare it to the human adjudicated standard for the set we tested against. I didn't mention it before because the tool isn't complete yet, but initial results for the set (excluding those marked as CUI-less) was as follows: Human adjudicated annotations: 4591 (excluding CUI-less) Annotations found matching the human adjudicated standard UMLSProcessor 2245 FastUMLSProcessor 215 [image: IMAT Solutions] http://imatsolutions.comhttp://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.commailto:bruce.tiet...@imatsolutions.com On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei pei.c...@childrens.harvard.edumailto:pei.c...@childrens.harvard.edu wrote: Bruce, Thanks for this-- very useful. Perhaps Sean Finan comment more- but it's also probably worth it to compare to an adjudicated human annotated gold standard. --Pei -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Thursday, December 18, 2014 1:45 PM To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org Subject: cTakes Annotation Comparison With the recent release of cTakes
Re: cTakes Annotation Comparison
Thanks Kim, This sounds interesting though I don't totally understand it. Are you saying that extraction performance for a given note depends on which order the note was in the processing queue? If so that's pretty bad! If you (or anyone else who understands this issue) has a concrete example I think that might help me understand what the problem is/was. Even though, as Pei mentioned, we are going to try moving the community to the faster dictionary, I would like to understand better just to help myself avoid issues of this type going forward (and verify the new dictionary doesn't use similar logic). Also, when we finish annotating the sample notes, might we use that as a point of comparison for the two dictionaries? That would get around the issue that not everyone has access to the datasets we used for validation and others are likely not able to share theirs either. And maybe we can replicate the notes if we want to simulate the scenario Kim is talking about with thousands or more notes. Tim On 12/19/2014 10:24 AM, Kim Ebert wrote: Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the corpus size, I could see the permutation engine eventually only have a single permutation of 1,2,3,4. Typically though, this isn't very easily detected in the first 100 or so documents. We discovered this issue when we made cTAKES have consistent output of codes in our system. [IMAT Solutions]http://imatsolutions.com Kim Ebert Software Engineer [Office:] 801.669.7342 kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com On 12/19/2014 07:05 AM, Savova, Guergana wrote: We are doing a similar kind of evaluation and will report the results. Before we released the Fast lookup, we did a systematic evaluation across three gold standard sets. We did not see the trend that Bruce reported below. The P, R and F1 results from the old dictionary look up and the fast one were similar. Thank you everyone! --Guergana -Original Message- From: David Kincaid [mailto:kincaid.d...@gmail.com] Sent: Friday, December 19, 2014 9:02 AM To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my small tests that I've done in a non-systematic way. Did you happen to capture the number of false positives yet (annotations made by cTAKES that are not in the human adjudicated standard)? I've seen a lot of dictionary hits that are not actually entity mentions, but I haven't had a chance to do a systematic analysis (we're working on our annotated gold standard now). One great example is the antibiotic Today. Every time the word today appears in any text it is annotated as a medication mention when it almost never is being used in that sense. These results by themselves are quite disappointing to me. Both the UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor recall. It seems like the trade off for more speed is a ten-fold (or more) decrease in entity recognition. Thanks again for sharing your results with us. I think they are very useful to the project. - Dave On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.commailto:bruce.tiet...@perfectsearchcorp.com wrote: Actually, we are working on a similar tool to compare it to the human adjudicated standard for the set we tested against. I didn't mention it before because the tool isn't complete yet, but initial results for the set (excluding those marked as CUI-less) was as follows: Human adjudicated annotations: 4591 (excluding CUI-less) Annotations found matching the human adjudicated standard UMLSProcessor 2245 FastUMLSProcessor 215 [image: IMAT Solutions] http://imatsolutions.comhttp://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.commailto:bruce.tiet...@imatsolutions.com On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei pei.c...@childrens.harvard.edumailto:pei.c...@childrens.harvard.edu wrote: Bruce, Thanks for this-- very useful. Perhaps Sean Finan comment more- but it's also probably
Re: cTakes Annotation Comparison
that Bruce reported below. The P, R and F1 results from the old dictionary look up and the fast one were similar. Thank you everyone! --Guergana -Original Message- From: David Kincaid [mailto:kincaid.d...@gmail.com] Sent: Friday, December 19, 2014 9:02 AM To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my small tests that I've done in a non-systematic way. Did you happen to capture the number of false positives yet (annotations made by cTAKES that are not in the human adjudicated standard)? I've seen a lot of dictionary hits that are not actually entity mentions, but I haven't had a chance to do a systematic analysis (we're working on our annotated gold standard now). One great example is the antibiotic Today. Every time the word today appears in any text it is annotated as a medication mention when it almost never is being used in that sense. These results by themselves are quite disappointing to me. Both the UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor recall. It seems like the trade off for more speed is a ten-fold (or more) decrease in entity recognition. Thanks again for sharing your results with us. I think they are very useful to the project. - Dave On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.commailto:bruce.tiet...@perfectsearchcorp.com wrote: Actually, we are working on a similar tool to compare it to the human adjudicated standard for the set we tested against. I didn't mention it before because the tool isn't complete yet, but initial results for the set (excluding those marked as CUI-less) was as follows: Human adjudicated annotations: 4591 (excluding CUI-less) Annotations found matching the human adjudicated standard UMLSProcessor 2245 FastUMLSProcessor 215 [image: IMAT Solutions] http://imatsolutions.comhttp://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.commailto:bruce.tiet...@imatsolutions.com On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei pei.c...@childrens.harvard.edumailto:pei.c...@childrens.harvard.edu wrote: Bruce, Thanks for this-- very useful. Perhaps Sean Finan comment more- but it's also probably worth it to compare to an adjudicated human annotated gold standard. --Pei -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Thursday, December 18, 2014 1:45 PM To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org Subject: cTakes Annotation Comparison With the recent release of cTakes 3.2.1, we were very interested in checking for any differences in annotations between using the AggregatePlaintextUMLSProcessor pipeline and the AggregatePlanetextFastUMLSProcessor pipeline within this release of cTakes with its associated set of UMLS resources. We chose to use the SHARE 14-a-b Training data that consists of 199 documents (Discharge 61, ECG 54, Echo 42 and Radiology 42) as the basis for the comparison. We decided to share a summary of the results with the development community. Documents Processed: 199 Processing Time: UMLSProcessor 2,439 seconds FastUMLSProcessor1,837 seconds Total Annotations Reported: UMLSProcessor 20,365 annotations FastUMLSProcessor 8,284 annotations Annotation Comparisons: Annotations common to both sets: 3,940 Annotations reported only by the UMLSProcessor: 16,425 Annotations reported only by the FastUMLSProcessor:4,344 If anyone is interested, following was our test procedure: We used the UIMA CPE to process the document set twice, once using the AggregatePlaintextUMLSProcessor pipeline and once using the AggregatePlaintextFastUMLSProcessor pipeline. We used the WriteCAStoFile CAS consumer to write the results to output files. We used a tool we recently developed to analyze and compare the annotations generated by the two pipelines. The tool compares the two outputs for each file and reports any differences in the annotations (MedicationMention, SignSymptomMention, ProcedureMention, AnatomicalSiteMention, and DiseaseDisorderMention) between the two output sets. The tool reports the number of 'matches' and 'misses' between each annotation set. A 'match' is defined as the presence of an identified source text interval with its associated CUI appearing in both annotation sets. A 'miss' is defined as the presence of an identified source text interval and its associated CUI in one annotation set, but no matching identified source text interval and CUI in the other. The tool also reports the total number of annotations (source text intervals with associated CUIs) reported in each
RE: cTakes Annotation Comparison
Several thoughts: 1. The ShARE corpus annotates only mentions of type Diseases/Disorders and only Anatomical Sites associated with a Disease/Disorder. This is by design. cTAKES annotates all mentions of types Diseases/Disorders, Signs/Symptoms, Procedures, Medications and Anatomical Sites. Therefore you will get MANY more annotations with cTAKES. Eventually the ShARe corpus will be expanded to the other types. 2. Keeping (1) in mind, you can approximately estimate the precision/recall/f1 of cTAKES on the ShARe corpus if you output only mentions of type Disease/Disorder. 3. Could you send us the list of files you use from ShARe to test? We have the corpus and would like to run against as well. Hope this makes sense... --Guergana -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 1:16 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Our analysis against the human adjudicated gold standard from this SHARE corpus is using a simple check to see if the cTakes output included the annotation specified by the gold standard. The initial results I reported were for exact matches of CUI and text span. Only exact matches were counted. It looks like if we also count as matches cTakes annotations with a matching CUI and a text span that overlaps the gold standard text span then the matches increase to 224 matching annotations for the FastUMLS pipeline and 2319 for the the old pipeline. The question was also asked about annotations in the cTakes output that were not in the human adjudicated gold standard. The answer is yes, there were a lot of additional annotations made by cTakes that don't appear to be in the gold standard. We haven't analyzed that yet, but it looks like the gold standard we are using may only have Disease_Disorder annotations. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: Thanks Kim, This sounds interesting though I don't totally understand it. Are you saying that extraction performance for a given note depends on which order the note was in the processing queue? If so that's pretty bad! If you (or anyone else who understands this issue) has a concrete example I think that might help me understand what the problem is/was. Even though, as Pei mentioned, we are going to try moving the community to the faster dictionary, I would like to understand better just to help myself avoid issues of this type going forward (and verify the new dictionary doesn't use similar logic). Also, when we finish annotating the sample notes, might we use that as a point of comparison for the two dictionaries? That would get around the issue that not everyone has access to the datasets we used for validation and others are likely not able to share theirs either. And maybe we can replicate the notes if we want to simulate the scenario Kim is talking about with thousands or more notes. Tim On 12/19/2014 10:24 AM, Kim Ebert wrote: Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the corpus size, I could see the permutation engine eventually only have a single permutation of 1,2,3,4. Typically though, this isn't very easily detected in the first 100 or so documents. We discovered this issue when we made cTAKES have consistent output of codes in our system. [IMAT Solutions]http://imatsolutions.com Kim Ebert Software Engineer [Office:] 801.669.7342 kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com On 12/19/2014 07:05 AM, Savova, Guergana wrote: We are doing a similar kind of evaluation and will report the results. Before we released the Fast lookup, we did a systematic evaluation across three gold standard sets. We did not see the trend that Bruce reported below. The P, R and F1 results from the old dictionary look up and the fast one were similar. Thank you everyone! --Guergana -Original Message- From: David Kincaid
RE: cTakes Annotation Comparison
One quick mention: The cTakes dictionaries are built with UMLS 2011AB. If the Human annotations were not done using the same UMLS version then there WILL be differences in CUI and Semantic group. I don't have time to go into it with details, examples, etc. just be aware that every 6 months cuis are added, removed, deprecated, and moved from one TUI to another. Sean -Original Message- From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] Sent: Friday, December 19, 2014 1:28 PM To: dev@ctakes.apache.org Subject: RE: cTakes Annotation Comparison Several thoughts: 1. The ShARE corpus annotates only mentions of type Diseases/Disorders and only Anatomical Sites associated with a Disease/Disorder. This is by design. cTAKES annotates all mentions of types Diseases/Disorders, Signs/Symptoms, Procedures, Medications and Anatomical Sites. Therefore you will get MANY more annotations with cTAKES. Eventually the ShARe corpus will be expanded to the other types. 2. Keeping (1) in mind, you can approximately estimate the precision/recall/f1 of cTAKES on the ShARe corpus if you output only mentions of type Disease/Disorder. 3. Could you send us the list of files you use from ShARe to test? We have the corpus and would like to run against as well. Hope this makes sense... --Guergana -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 1:16 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Our analysis against the human adjudicated gold standard from this SHARE corpus is using a simple check to see if the cTakes output included the annotation specified by the gold standard. The initial results I reported were for exact matches of CUI and text span. Only exact matches were counted. It looks like if we also count as matches cTakes annotations with a matching CUI and a text span that overlaps the gold standard text span then the matches increase to 224 matching annotations for the FastUMLS pipeline and 2319 for the the old pipeline. The question was also asked about annotations in the cTakes output that were not in the human adjudicated gold standard. The answer is yes, there were a lot of additional annotations made by cTakes that don't appear to be in the gold standard. We haven't analyzed that yet, but it looks like the gold standard we are using may only have Disease_Disorder annotations. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: Thanks Kim, This sounds interesting though I don't totally understand it. Are you saying that extraction performance for a given note depends on which order the note was in the processing queue? If so that's pretty bad! If you (or anyone else who understands this issue) has a concrete example I think that might help me understand what the problem is/was. Even though, as Pei mentioned, we are going to try moving the community to the faster dictionary, I would like to understand better just to help myself avoid issues of this type going forward (and verify the new dictionary doesn't use similar logic). Also, when we finish annotating the sample notes, might we use that as a point of comparison for the two dictionaries? That would get around the issue that not everyone has access to the datasets we used for validation and others are likely not able to share theirs either. And maybe we can replicate the notes if we want to simulate the scenario Kim is talking about with thousands or more notes. Tim On 12/19/2014 10:24 AM, Kim Ebert wrote: Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the corpus size, I could see the permutation engine eventually only have a single permutation of 1,2,3,4. Typically though, this isn't very easily detected in the first 100 or so documents. We discovered this issue when we made cTAKES have consistent output of codes in our system. [IMAT Solutions]http
Re: cTakes Annotation Comparison
Sean, I don't think that would be an issue since both the rare word lookup and the first word lookup are using UMLS 2011AB. Or is the rare word lookup using a different dictionary? I would expect roughly similar results between the two when it comes to differences between UMLS versions. IMAT Solutions http://imatsolutions.com Kim Ebert Software Engineer Office: 801.669.7342 kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com On 12/19/2014 11:31 AM, Finan, Sean wrote: One quick mention: The cTakes dictionaries are built with UMLS 2011AB. If the Human annotations were not done using the same UMLS version then there WILL be differences in CUI and Semantic group. I don't have time to go into it with details, examples, etc. just be aware that every 6 months cuis are added, removed, deprecated, and moved from one TUI to another. Sean -Original Message- From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] Sent: Friday, December 19, 2014 1:28 PM To: dev@ctakes.apache.org Subject: RE: cTakes Annotation Comparison Several thoughts: 1. The ShARE corpus annotates only mentions of type Diseases/Disorders and only Anatomical Sites associated with a Disease/Disorder. This is by design. cTAKES annotates all mentions of types Diseases/Disorders, Signs/Symptoms, Procedures, Medications and Anatomical Sites. Therefore you will get MANY more annotations with cTAKES. Eventually the ShARe corpus will be expanded to the other types. 2. Keeping (1) in mind, you can approximately estimate the precision/recall/f1 of cTAKES on the ShARe corpus if you output only mentions of type Disease/Disorder. 3. Could you send us the list of files you use from ShARe to test? We have the corpus and would like to run against as well. Hope this makes sense... --Guergana -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 1:16 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Our analysis against the human adjudicated gold standard from this SHARE corpus is using a simple check to see if the cTakes output included the annotation specified by the gold standard. The initial results I reported were for exact matches of CUI and text span. Only exact matches were counted. It looks like if we also count as matches cTakes annotations with a matching CUI and a text span that overlaps the gold standard text span then the matches increase to 224 matching annotations for the FastUMLS pipeline and 2319 for the the old pipeline. The question was also asked about annotations in the cTakes output that were not in the human adjudicated gold standard. The answer is yes, there were a lot of additional annotations made by cTakes that don't appear to be in the gold standard. We haven't analyzed that yet, but it looks like the gold standard we are using may only have Disease_Disorder annotations. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: Thanks Kim, This sounds interesting though I don't totally understand it. Are you saying that extraction performance for a given note depends on which order the note was in the processing queue? If so that's pretty bad! If you (or anyone else who understands this issue) has a concrete example I think that might help me understand what the problem is/was. Even though, as Pei mentioned, we are going to try moving the community to the faster dictionary, I would like to understand better just to help myself avoid issues of this type going forward (and verify the new dictionary doesn't use similar logic). Also, when we finish annotating the sample notes, might we use that as a point of comparison for the two dictionaries? That would get around the issue that not everyone has access to the datasets we used for validation and others are likely not able to share theirs either. And maybe we can replicate the notes if we want to simulate the scenario Kim is talking about with thousands or more notes. Tim On 12/19/2014 10:24 AM, Kim Ebert wrote: Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4
RE: cTakes Annotation Comparison
I’m bringing it up in case the Human Annotations were done using a different version. From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 1:40 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Sean, I don't think that would be an issue since both the rare word lookup and the first word lookup are using UMLS 2011AB. Or is the rare word lookup using a different dictionary? I would expect roughly similar results between the two when it comes to differences between UMLS versions. [IMAT Solutions]http://imatsolutions.com Kim Ebert Software Engineer [Office:]801.669.7342 kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com On 12/19/2014 11:31 AM, Finan, Sean wrote: One quick mention: The cTakes dictionaries are built with UMLS 2011AB. If the Human annotations were not done using the same UMLS version then there WILL be differences in CUI and Semantic group. I don't have time to go into it with details, examples, etc. just be aware that every 6 months cuis are added, removed, deprecated, and moved from one TUI to another. Sean -Original Message- From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] Sent: Friday, December 19, 2014 1:28 PM To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org Subject: RE: cTakes Annotation Comparison Several thoughts: 1. The ShARE corpus annotates only mentions of type Diseases/Disorders and only Anatomical Sites associated with a Disease/Disorder. This is by design. cTAKES annotates all mentions of types Diseases/Disorders, Signs/Symptoms, Procedures, Medications and Anatomical Sites. Therefore you will get MANY more annotations with cTAKES. Eventually the ShARe corpus will be expanded to the other types. 2. Keeping (1) in mind, you can approximately estimate the precision/recall/f1 of cTAKES on the ShARe corpus if you output only mentions of type Disease/Disorder. 3. Could you send us the list of files you use from ShARe to test? We have the corpus and would like to run against as well. Hope this makes sense... --Guergana -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 1:16 PM To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Our analysis against the human adjudicated gold standard from this SHARE corpus is using a simple check to see if the cTakes output included the annotation specified by the gold standard. The initial results I reported were for exact matches of CUI and text span. Only exact matches were counted. It looks like if we also count as matches cTakes annotations with a matching CUI and a text span that overlaps the gold standard text span then the matches increase to 224 matching annotations for the FastUMLS pipeline and 2319 for the the old pipeline. The question was also asked about annotations in the cTakes output that were not in the human adjudicated gold standard. The answer is yes, there were a lot of additional annotations made by cTakes that don't appear to be in the gold standard. We haven't analyzed that yet, but it looks like the gold standard we are using may only have Disease_Disorder annotations. [image: IMAT Solutions] http://imatsolutions.comhttp://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.commailto:bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy timothy.mil...@childrens.harvard.edumailto:timothy.mil...@childrens.harvard.edu wrote: Thanks Kim, This sounds interesting though I don't totally understand it. Are you saying that extraction performance for a given note depends on which order the note was in the processing queue? If so that's pretty bad! If you (or anyone else who understands this issue) has a concrete example I think that might help me understand what the problem is/was. Even though, as Pei mentioned, we are going to try moving the community to the faster dictionary, I would like to understand better just to help myself avoid issues of this type going forward (and verify the new dictionary doesn't use similar logic). Also, when we finish annotating the sample notes, might we use that as a point of comparison for the two dictionaries? That would get around the issue that not everyone has access to the datasets we used for validation and others are likely not able to share theirs either. And maybe we can replicate the notes if we want to simulate the scenario Kim is talking about with thousands or more notes. Tim On 12/19/2014 10:24 AM, Kim Ebert wrote: Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused
Re: cTakes Annotation Comparison
Pei, I don't think bugs/issues should be part of determining if one algorithm vs the other is superior. Obviously, it is worth mentioning the bugs, but if the fast lookup method has worse precision and recall but better performance, vs the slower but more accurate first word lookup algorithm, then time should be invested in fixing those bugs and resolving those weird issues. Now I'm not saying which one is superior in this case, as the data will end up speaking for itself one way or the other; bus as of right now, I'm not convinced yet that the old dictionary lookup is obsolete yet, and I'm not sure the community is convinced yet either. IMAT Solutions http://imatsolutions.com Kim Ebert Software Engineer Office: 801.669.7342 kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com On 12/19/2014 08:39 AM, Chen, Pei wrote: Also check out stats that Sean ran before releasing the new component on: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx From the evaluation and experience, the new lookup algorithm should be a huge improvement in terms of both speed and accuracy. This is very different than what Bruce mentioned… I’m sure Sean will chime here. (The old dictionary lookup is essentially obsolete now- plagued with bugs/issues as you mentioned.) --Pei *From:*Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 10:25 AM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the corpus size, I could see the permutation engine eventually only have a single permutation of 1,2,3,4. Typically though, this isn't very easily detected in the first 100 or so documents. We discovered this issue when we made cTAKES have consistent output of codes in our system. IMAT Solutions http://imatsolutions.com *Kim Ebert* Software Engineer Office:801.669.7342 kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com On 12/19/2014 07:05 AM, Savova, Guergana wrote: We are doing a similar kind of evaluation and will report the results. Before we released the Fast lookup, we did a systematic evaluation across three gold standard sets. We did not see the trend that Bruce reported below. The P, R and F1 results from the old dictionary look up and the fast one were similar. Thank you everyone! --Guergana -Original Message- From: David Kincaid [mailto:kincaid.d...@gmail.com] Sent: Friday, December 19, 2014 9:02 AM To: dev@ctakes.apache.org mailto:dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my small tests that I've done in a non-systematic way. Did you happen to capture the number of false positives yet (annotations made by cTAKES that are not in the human adjudicated standard)? I've seen a lot of dictionary hits that are not actually entity mentions, but I haven't had a chance to do a systematic analysis (we're working on our annotated gold standard now). One great example is the antibiotic Today. Every time the word today appears in any text it is annotated as a medication mention when it almost never is being used in that sense. These results by themselves are quite disappointing to me. Both the UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor recall. It seems like the trade off for more speed is a ten-fold (or more) decrease in entity recognition. Thanks again for sharing your results with us. I think they are very useful to the project. - Dave On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com mailto:bruce.tiet...@perfectsearchcorp.com wrote: Actually, we are working on a similar tool to compare it to the human adjudicated standard for the set we tested against. I didn't mention it before because the tool isn't complete yet, but initial
Re: cTakes Annotation Comparison
Rather than spam the mailing list with the list of filenames for the files in the set we used, I would be happy to send it to anyone interested privately. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 11:47 AM, Kim Ebert kim.eb...@imatsolutions.com wrote: Pei, I don't think bugs/issues should be part of determining if one algorithm vs the other is superior. Obviously, it is worth mentioning the bugs, but if the fast lookup method has worse precision and recall but better performance, vs the slower but more accurate first word lookup algorithm, then time should be invested in fixing those bugs and resolving those weird issues. Now I'm not saying which one is superior in this case, as the data will end up speaking for itself one way or the other; bus as of right now, I'm not convinced yet that the old dictionary lookup is obsolete yet, and I'm not sure the community is convinced yet either. [image: IMAT Solutions] http://imatsolutions.com Kim Ebert Software Engineer [image: Office:] 801.669.7342 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com On 12/19/2014 08:39 AM, Chen, Pei wrote: Also check out stats that Sean ran before releasing the new component on: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx From the evaluation and experience, the new lookup algorithm should be a huge improvement in terms of both speed and accuracy. This is very different than what Bruce mentioned… I’m sure Sean will chime here. (The old dictionary lookup is essentially obsolete now- plagued with bugs/issues as you mentioned.) --Pei *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com kim.eb...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 10:25 AM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the corpus size, I could see the permutation engine eventually only have a single permutation of 1,2,3,4. Typically though, this isn't very easily detected in the first 100 or so documents. We discovered this issue when we made cTAKES have consistent output of codes in our system. [image: IMAT Solutions] http://imatsolutions.com *Kim Ebert* Software Engineer [image: Office:]801.669.7342 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com On 12/19/2014 07:05 AM, Savova, Guergana wrote: We are doing a similar kind of evaluation and will report the results. Before we released the Fast lookup, we did a systematic evaluation across three gold standard sets. We did not see the trend that Bruce reported below. The P, R and F1 results from the old dictionary look up and the fast one were similar. Thank you everyone! --Guergana -Original Message- From: David Kincaid [mailto:kincaid.d...@gmail.com kincaid.d...@gmail.com] Sent: Friday, December 19, 2014 9:02 AM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my small tests that I've done in a non-systematic way. Did you happen to capture the number of false positives yet (annotations made by cTAKES that are not in the human adjudicated standard)? I've seen a lot of dictionary hits that are not actually entity mentions, but I haven't had a chance to do a systematic analysis (we're working on our annotated gold standard now). One great example is the antibiotic Today. Every time the word today appears in any text it is annotated as a medication mention when it almost never is being used in that sense. These results by themselves are quite disappointing to me. Both the UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor recall. It seems like the trade off for more speed is a ten-fold (or more) decrease in entity recognition. Thanks again for sharing your results with us. I think they are very useful to the project. - Dave On Thu, Dec 18, 2014 at 5:06 PM
Re: cTakes Annotation Comparison
Correction -- So far, I did steps 1 and 2 of Sean's email. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:22 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com wrote: Sean, I tried the configuration changes you mentioned in your earlier email. The results are as follows: Total Annotations found: 12,161 (default configuration found 8,284) If counting exact span matches, this run only matched 211 (default configuration matched 215). If counting overlapping spans, this run only matched 220 (default configuration matched 224) Bruce [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Kim, Maintenance is the factor not bugs/issue to forge ahead. They are 2 components that do the same thing with the same goal (As Sean mentioned, one should be able configure the new code base to replicate the old algorithm if required- it’s just a simpler and cleaner code base. If this is not the case or if there are issues, we should fix it and move forward.). We can keep the old component around for as long as needed, but it’s likely going to have limited support… --Pei *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com] *Sent:* Friday, December 19, 2014 1:47 PM *To:* Chen, Pei; dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Pei, I don't think bugs/issues should be part of determining if one algorithm vs the other is superior. Obviously, it is worth mentioning the bugs, but if the fast lookup method has worse precision and recall but better performance, vs the slower but more accurate first word lookup algorithm, then time should be invested in fixing those bugs and resolving those weird issues. Now I'm not saying which one is superior in this case, as the data will end up speaking for itself one way or the other; bus as of right now, I'm not convinced yet that the old dictionary lookup is obsolete yet, and I'm not sure the community is convinced yet either. [image: IMAT Solutions] http://imatsolutions.com *Kim Ebert* Software Engineer [image: Office:]801.669.7342 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com On 12/19/2014 08:39 AM, Chen, Pei wrote: Also check out stats that Sean ran before releasing the new component on: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx From the evaluation and experience, the new lookup algorithm should be a huge improvement in terms of both speed and accuracy. This is very different than what Bruce mentioned… I’m sure Sean will chime here. (The old dictionary lookup is essentially obsolete now- plagued with bugs/issues as you mentioned.) --Pei *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com kim.eb...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 10:25 AM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the corpus size, I could see the permutation engine eventually only have a single permutation of 1,2,3,4. Typically though, this isn't very easily detected in the first 100 or so documents. We discovered this issue when we made cTAKES have consistent output of codes in our system. [image: IMAT Solutions] http://imatsolutions.com *Kim Ebert* Software Engineer [image: Office:]801.669.7342 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com On 12/19/2014 07:05 AM, Savova, Guergana wrote: We are doing a similar kind of evaluation and will report the results. Before we released the Fast lookup, we did a systematic evaluation across three gold standard sets. We did not see the trend that Bruce reported below. The P, R and F1 results from the old dictionary look up and the fast one were similar. Thank you everyone! --Guergana -Original Message- From: David
RE: cTakes Annotation Comparison
Hi Bruce, I'm not sure how there would be fewer matches with the overlap processor. There should be all of the matches from the non-overlap processor plus those from the overlap. Decreasing from 215 to 211 is strange. Have you done any manual spot checks on this? It is really bizarre that you'd only have two matches per document (100 docs?). Thanks, Sean -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 3:23 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Sean, I tried the configuration changes you mentioned in your earlier email. The results are as follows: Total Annotations found: 12,161 (default configuration found 8,284) If counting exact span matches, this run only matched 211 (default configuration matched 215). If counting overlapping spans, this run only matched 220 (default configuration matched 224) Bruce [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Kim, Maintenance is the factor not bugs/issue to forge ahead. They are 2 components that do the same thing with the same goal (As Sean mentioned, one should be able configure the new code base to replicate the old algorithm if required- it’s just a simpler and cleaner code base. If this is not the case or if there are issues, we should fix it and move forward.). We can keep the old component around for as long as needed, but it’s likely going to have limited support… --Pei *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com] *Sent:* Friday, December 19, 2014 1:47 PM *To:* Chen, Pei; dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Pei, I don't think bugs/issues should be part of determining if one algorithm vs the other is superior. Obviously, it is worth mentioning the bugs, but if the fast lookup method has worse precision and recall but better performance, vs the slower but more accurate first word lookup algorithm, then time should be invested in fixing those bugs and resolving those weird issues. Now I'm not saying which one is superior in this case, as the data will end up speaking for itself one way or the other; bus as of right now, I'm not convinced yet that the old dictionary lookup is obsolete yet, and I'm not sure the community is convinced yet either. [image: IMAT Solutions] http://imatsolutions.com *Kim Ebert* Software Engineer [image: Office:]801.669.7342 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com On 12/19/2014 08:39 AM, Chen, Pei wrote: Also check out stats that Sean ran before releasing the new component on: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup- fast/doc/DictionaryLookupStats.docx From the evaluation and experience, the new lookup algorithm should be a huge improvement in terms of both speed and accuracy. This is very different than what Bruce mentioned… I’m sure Sean will chime here. (The old dictionary lookup is essentially obsolete now- plagued with bugs/issues as you mentioned.) --Pei *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com kim.eb...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 10:25 AM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the corpus size, I could see the permutation engine eventually only have a single permutation of 1,2,3,4. Typically though, this isn't very easily detected in the first 100 or so documents. We discovered this issue when we made cTAKES have consistent output of codes in our system. [image: IMAT Solutions] http://imatsolutions.com *Kim Ebert* Software Engineer [image: Office:]801.669.7342 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com On 12/19/2014 07:05 AM, Savova, Guergana wrote: We are doing a similar kind of evaluation and will report the results. Before we released the Fast lookup, we did
Re: cTakes Annotation Comparison
My original results were using a newly downloaded cTakes 3.2.1 with the separately downloaded resources copied in. There were no changes to any of the configuration files. As far as this last run, I modified the UMLSLookupAnnotator.xml and AggregatePlaintextFastUMLSProcessor.xml. I've attached the modified ones I used (but they may not get through the mailing list). [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Bruce, I'm not sure how there would be fewer matches with the overlap processor. There should be all of the matches from the non-overlap processor plus those from the overlap. Decreasing from 215 to 211 is strange. Have you done any manual spot checks on this? It is really bizarre that you'd only have two matches per document (100 docs?). Thanks, Sean -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 3:23 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Sean, I tried the configuration changes you mentioned in your earlier email. The results are as follows: Total Annotations found: 12,161 (default configuration found 8,284) If counting exact span matches, this run only matched 211 (default configuration matched 215). If counting overlapping spans, this run only matched 220 (default configuration matched 224) Bruce [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Kim, Maintenance is the factor not bugs/issue to forge ahead. They are 2 components that do the same thing with the same goal (As Sean mentioned, one should be able configure the new code base to replicate the old algorithm if required- it’s just a simpler and cleaner code base. If this is not the case or if there are issues, we should fix it and move forward.). We can keep the old component around for as long as needed, but it’s likely going to have limited support… --Pei *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com] *Sent:* Friday, December 19, 2014 1:47 PM *To:* Chen, Pei; dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Pei, I don't think bugs/issues should be part of determining if one algorithm vs the other is superior. Obviously, it is worth mentioning the bugs, but if the fast lookup method has worse precision and recall but better performance, vs the slower but more accurate first word lookup algorithm, then time should be invested in fixing those bugs and resolving those weird issues. Now I'm not saying which one is superior in this case, as the data will end up speaking for itself one way or the other; bus as of right now, I'm not convinced yet that the old dictionary lookup is obsolete yet, and I'm not sure the community is convinced yet either. [image: IMAT Solutions] http://imatsolutions.com *Kim Ebert* Software Engineer [image: Office:]801.669.7342 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com On 12/19/2014 08:39 AM, Chen, Pei wrote: Also check out stats that Sean ran before releasing the new component on: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup- fast/doc/DictionaryLookupStats.docx From the evaluation and experience, the new lookup algorithm should be a huge improvement in terms of both speed and accuracy. This is very different than what Bruce mentioned… I’m sure Sean will chime here. (The old dictionary lookup is essentially obsolete now- plagued with bugs/issues as you mentioned.) --Pei *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com kim.eb...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 10:25 AM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release
RE: cTakes Annotation Comparison
Hi Bruce, Correction -- So far, I did steps 1 and 2 of Sean's email. No problem. Aside from recreating the database, those two steps have the greatest impact. But before you change anything else, please do some manual spot checks. I have never seen a case where the lookup would be so horribly inaccurate. Thanks -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 3:29 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Correction -- So far, I did steps 1 and 2 of Sean's email. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:22 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com wrote: Sean, I tried the configuration changes you mentioned in your earlier email. The results are as follows: Total Annotations found: 12,161 (default configuration found 8,284) If counting exact span matches, this run only matched 211 (default configuration matched 215). If counting overlapping spans, this run only matched 220 (default configuration matched 224) Bruce [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Kim, Maintenance is the factor not bugs/issue to forge ahead. They are 2 components that do the same thing with the same goal (As Sean mentioned, one should be able configure the new code base to replicate the old algorithm if required- it’s just a simpler and cleaner code base. If this is not the case or if there are issues, we should fix it and move forward.). We can keep the old component around for as long as needed, but it’s likely going to have limited support… --Pei *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com] *Sent:* Friday, December 19, 2014 1:47 PM *To:* Chen, Pei; dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Pei, I don't think bugs/issues should be part of determining if one algorithm vs the other is superior. Obviously, it is worth mentioning the bugs, but if the fast lookup method has worse precision and recall but better performance, vs the slower but more accurate first word lookup algorithm, then time should be invested in fixing those bugs and resolving those weird issues. Now I'm not saying which one is superior in this case, as the data will end up speaking for itself one way or the other; bus as of right now, I'm not convinced yet that the old dictionary lookup is obsolete yet, and I'm not sure the community is convinced yet either. [image: IMAT Solutions] http://imatsolutions.com *Kim Ebert* Software Engineer [image: Office:]801.669.7342 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com On 12/19/2014 08:39 AM, Chen, Pei wrote: Also check out stats that Sean ran before releasing the new component on: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup -fast/doc/DictionaryLookupStats.docx From the evaluation and experience, the new lookup algorithm should be a huge improvement in terms of both speed and accuracy. This is very different than what Bruce mentioned… I’m sure Sean will chime here. (The old dictionary lookup is essentially obsolete now- plagued with bugs/issues as you mentioned.) --Pei *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com kim.eb...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 10:25 AM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Guergana, I'm curious to the number of records that are in your gold standard sets, or if your gold standard set was run through a long running cTAKES process. I know at some point we fixed a bug in the old dictionary lookup that caused the permutations to become corrupted over time. Typically this isn't seen in the first few records, but over time as patterns are used the permutations would become corrupted. This caused documents that were fed through cTAKES more than once to have less codes returned than the first time. For example, if a permutation of 4,2,3,1 was found, the permutation would be corrupted to be 1,2,3,4. It would no longer be possible to detect permutations of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the corpus size, I could see the permutation engine eventually only have a single permutation of 1,2,3,4. Typically though, this isn't very easily detected in the first 100 or so documents. We discovered this issue when we made cTAKES have consistent output of codes in our system. [image: IMAT Solutions] http://imatsolutions.com *Kim Ebert
Re: cTakes Annotation Comparison
I'll do that -- there is always a possibility of bugs in the analysis tool. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Sorry, I meant “Do some spot checks on the validity”. In other words, when your script reports that a cui and/or span is missing, manually look at the data and see if it really is. Just open up one .xmi in the CVD and see what it looks like. Thanks, Sean *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 3:37 PM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison My original results were using a newly downloaded cTakes 3.2.1 with the separately downloaded resources copied in. There were no changes to any of the configuration files. As far as this last run, I modified the UMLSLookupAnnotator.xml and AggregatePlaintextFastUMLSProcessor.xml. I've attached the modified ones I used (but they may not get through the mailing list). [image: Image removed by sender. IMAT Solutions] http://imatsolutions.com *Bruce Tietjen* Senior Software Engineer [image: Image removed by sender. Mobile:]801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Bruce, I'm not sure how there would be fewer matches with the overlap processor. There should be all of the matches from the non-overlap processor plus those from the overlap. Decreasing from 215 to 211 is strange. Have you done any manual spot checks on this? It is really bizarre that you'd only have two matches per document (100 docs?). Thanks, Sean -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 3:23 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Sean, I tried the configuration changes you mentioned in your earlier email. The results are as follows: Total Annotations found: 12,161 (default configuration found 8,284) If counting exact span matches, this run only matched 211 (default configuration matched 215). If counting overlapping spans, this run only matched 220 (default configuration matched 224) Bruce [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Kim, Maintenance is the factor not bugs/issue to forge ahead. They are 2 components that do the same thing with the same goal (As Sean mentioned, one should be able configure the new code base to replicate the old algorithm if required- it’s just a simpler and cleaner code base. If this is not the case or if there are issues, we should fix it and move forward.). We can keep the old component around for as long as needed, but it’s likely going to have limited support… --Pei *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com] *Sent:* Friday, December 19, 2014 1:47 PM *To:* Chen, Pei; dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Pei, I don't think bugs/issues should be part of determining if one algorithm vs the other is superior. Obviously, it is worth mentioning the bugs, but if the fast lookup method has worse precision and recall but better performance, vs the slower but more accurate first word lookup algorithm, then time should be invested in fixing those bugs and resolving those weird issues. Now I'm not saying which one is superior in this case, as the data will end up speaking for itself one way or the other; bus as of right now, I'm not convinced yet that the old dictionary lookup is obsolete yet, and I'm not sure the community is convinced yet either. [image: IMAT Solutions] http://imatsolutions.com *Kim Ebert* Software Engineer [image: Office:]801.669.7342 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com On 12/19/2014 08:39 AM, Chen, Pei wrote: Also check out stats that Sean ran before releasing the new component on: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup- fast/doc/DictionaryLookupStats.docx From the evaluation and experience, the new lookup algorithm should be a huge improvement in terms of both speed and accuracy. This is very different than what Bruce mentioned… I’m sure Sean will chime here. (The old dictionary lookup is essentially obsolete now- plagued with bugs/issues as you mentioned.) --Pei *From:* Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com kim.eb...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 10:25 AM *To:* dev@ctakes.apache.org *Subject
Re: cTakes Annotation Comparison
My apologies to Sean and everyone, I am happy to report that I found a bug in our analysis tools that was missing the last FSArray entry for any FSArray list. With the bug fixed, the results look MUCH better. UMLSProcessor found 31,598 annotations FastUMLSProcessor found 30,716 annotations There were 23,522 annotations that were exact matches between the two. When comparing with the gold standard annotations (4591 annotations): UMLSProcessor found 2632 matches (2,735 including overlaps) FastUMLSProcessor found 2795 matches (2,842 including overlaps) [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com wrote: I'll do that -- there is always a possibility of bugs in the analysis tool. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Sorry, I meant “Do some spot checks on the validity”. In other words, when your script reports that a cui and/or span is missing, manually look at the data and see if it really is. Just open up one .xmi in the CVD and see what it looks like. Thanks, Sean *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 3:37 PM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison My original results were using a newly downloaded cTakes 3.2.1 with the separately downloaded resources copied in. There were no changes to any of the configuration files. As far as this last run, I modified the UMLSLookupAnnotator.xml and AggregatePlaintextFastUMLSProcessor.xml. I've attached the modified ones I used (but they may not get through the mailing list). [image: Image removed by sender. IMAT Solutions] http://imatsolutions.com *Bruce Tietjen* Senior Software Engineer [image: Image removed by sender. Mobile:]801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Bruce, I'm not sure how there would be fewer matches with the overlap processor. There should be all of the matches from the non-overlap processor plus those from the overlap. Decreasing from 215 to 211 is strange. Have you done any manual spot checks on this? It is really bizarre that you'd only have two matches per document (100 docs?). Thanks, Sean -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 3:23 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Sean, I tried the configuration changes you mentioned in your earlier email. The results are as follows: Total Annotations found: 12,161 (default configuration found 8,284) If counting exact span matches, this run only matched 211 (default configuration matched 215). If counting overlapping spans, this run only matched 220 (default configuration matched 224) Bruce [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Kim, Maintenance is the factor not bugs/issue to forge ahead. They are 2 components that do the same thing with the same goal (As Sean mentioned, one should be able configure the new code base to replicate the old algorithm if required- it’s just a simpler and cleaner code base. If this is not the case or if there are issues, we should fix it and move forward.). We can keep the old component around for as long as needed, but it’s likely going to have limited support… --Pei *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com] *Sent:* Friday, December 19, 2014 1:47 PM *To:* Chen, Pei; dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Pei, I don't think bugs/issues should be part of determining if one algorithm vs the other is superior. Obviously, it is worth mentioning the bugs, but if the fast lookup method has worse precision and recall but better performance, vs the slower but more accurate first word lookup algorithm, then time should be invested in fixing those bugs and resolving those weird issues. Now I'm not saying which one is superior in this case, as the data will end up speaking for itself one way or the other; bus as of right now, I'm not convinced yet that the old dictionary lookup is obsolete yet, and I'm not sure the community is convinced yet either. [image: IMAT Solutions] http://imatsolutions.com *Kim Ebert* Software Engineer [image: Office:]801.669.7342
RE: cTakes Annotation Comparison --- (^:
Apologies accepted. I'm really glad that you found the problem. So what you are saying is (just to be very very clear to everybody reading this thread): FastUMLSProcessor found 2795 matches (2,842 including overlaps) While UMLSProcessor found 2632 matches (2,735 including overlaps) --- So recall is BETTER in the fast lookup And... FastUMLSProcessor found 30,716 annotations While UMLSProcessor found 31,598 annotations --- So precision is also looking BETTER in the fast lookup Now maybe there will be a little more buy-in for the fast lookup. Cheers, Sean -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 5:05 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison My apologies to Sean and everyone, I am happy to report that I found a bug in our analysis tools that was missing the last FSArray entry for any FSArray list. With the bug fixed, the results look MUCH better. UMLSProcessor found 31,598 annotations FastUMLSProcessor found 30,716 annotations There were 23,522 annotations that were exact matches between the two. When comparing with the gold standard annotations (4591 annotations): UMLSProcessor found 2632 matches (2,735 including overlaps) FastUMLSProcessor found 2795 matches (2,842 including overlaps) [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com wrote: I'll do that -- there is always a possibility of bugs in the analysis tool. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Sorry, I meant “Do some spot checks on the validity”. In other words, when your script reports that a cui and/or span is missing, manually look at the data and see if it really is. Just open up one .xmi in the CVD and see what it looks like. Thanks, Sean *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 3:37 PM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison My original results were using a newly downloaded cTakes 3.2.1 with the separately downloaded resources copied in. There were no changes to any of the configuration files. As far as this last run, I modified the UMLSLookupAnnotator.xml and AggregatePlaintextFastUMLSProcessor.xml. I've attached the modified ones I used (but they may not get through the mailing list). [image: Image removed by sender. IMAT Solutions] http://imatsolutions.com *Bruce Tietjen* Senior Software Engineer [image: Image removed by sender. Mobile:]801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Bruce, I'm not sure how there would be fewer matches with the overlap processor. There should be all of the matches from the non-overlap processor plus those from the overlap. Decreasing from 215 to 211 is strange. Have you done any manual spot checks on this? It is really bizarre that you'd only have two matches per document (100 docs?). Thanks, Sean -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 3:23 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Sean, I tried the configuration changes you mentioned in your earlier email. The results are as follows: Total Annotations found: 12,161 (default configuration found 8,284) If counting exact span matches, this run only matched 211 (default configuration matched 215). If counting overlapping spans, this run only matched 220 (default configuration matched 224) Bruce [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Kim, Maintenance is the factor not bugs/issue to forge ahead. They are 2 components that do the same thing with the same goal (As Sean mentioned, one should be able configure the new code base to replicate the old algorithm if required- it’s just a simpler and cleaner code base. If this is not the case or if there are issues, we should fix it and move forward.). We can keep the old component around for as long as needed, but it’s likely going to have limited support… --Pei *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com] *Sent:* Friday, December 19, 2014 1:47 PM *To:* Chen, Pei; dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison
Re: cTakes Annotation Comparison
Bruce, I think we all feel a lot better now. I think the tool will be helpful moving forward. I've updated the git repo with the fix in case anyone is interested. IMAT Solutions http://imatsolutions.com Kim Ebert Software Engineer Office: 801.669.7342 kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com On 12/19/2014 03:04 PM, Bruce Tietjen wrote: My apologies to Sean and everyone, I am happy to report that I found a bug in our analysis tools that was missing the last FSArray entry for any FSArray list. With the bug fixed, the results look MUCH better. UMLSProcessor found 31,598 annotations FastUMLSProcessor found 30,716 annotations There were 23,522 annotations that were exact matches between the two. When comparing with the gold standard annotations (4591 annotations): UMLSProcessor found 2632 matches (2,735 including overlaps) FastUMLSProcessor found 2795 matches (2,842 including overlaps) [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com wrote: I'll do that -- there is always a possibility of bugs in the analysis tool. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Sorry, I meant “Do some spot checks on the validity”. In other words, when your script reports that a cui and/or span is missing, manually look at the data and see if it really is. Just open up one .xmi in the CVD and see what it looks like. Thanks, Sean *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 3:37 PM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison My original results were using a newly downloaded cTakes 3.2.1 with the separately downloaded resources copied in. There were no changes to any of the configuration files. As far as this last run, I modified the UMLSLookupAnnotator.xml and AggregatePlaintextFastUMLSProcessor.xml. I've attached the modified ones I used (but they may not get through the mailing list). [image: Image removed by sender. IMAT Solutions] http://imatsolutions.com *Bruce Tietjen* Senior Software Engineer [image: Image removed by sender. Mobile:]801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Bruce, I'm not sure how there would be fewer matches with the overlap processor. There should be all of the matches from the non-overlap processor plus those from the overlap. Decreasing from 215 to 211 is strange. Have you done any manual spot checks on this? It is really bizarre that you'd only have two matches per document (100 docs?). Thanks, Sean -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 3:23 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Sean, I tried the configuration changes you mentioned in your earlier email. The results are as follows: Total Annotations found: 12,161 (default configuration found 8,284) If counting exact span matches, this run only matched 211 (default configuration matched 215). If counting overlapping spans, this run only matched 220 (default configuration matched 224) Bruce [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Kim, Maintenance is the factor not bugs/issue to forge ahead. They are 2 components that do the same thing with the same goal (As Sean mentioned, one should be able configure the new code base to replicate the old algorithm if required- it’s just a simpler and cleaner code base. If this is not the case or if there are issues, we should fix it and move forward.). We can keep the old component around for as long as needed, but it’s likely going to have limited support… --Pei *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com] *Sent:* Friday, December 19, 2014 1:47 PM *To:* Chen, Pei; dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Pei, I don't think bugs/issues should be part of determining if one algorithm vs the other is superior. Obviously, it is worth mentioning the bugs, but if the fast lookup method has worse precision and recall but better performance, vs the slower but more accurate first word lookup algorithm, then time should be invested in fixing those bugs and resolving those weird issues. Now I'm not saying which one is superior in this case
Re: cTakes Annotation Comparison
When I only include SignSymptomMention and DiseaseDisorderMention in the analysis (which excludes annotations not included in the gold standard), the matched annotations remain the same while the total annotations found in those categories drop to the following: Total Annotations found: FastUMLSProcessing: 12,811 UMLSProcessing:46,571 [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 3:04 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com wrote: My apologies to Sean and everyone, I am happy to report that I found a bug in our analysis tools that was missing the last FSArray entry for any FSArray list. With the bug fixed, the results look MUCH better. UMLSProcessor found 31,598 annotations FastUMLSProcessor found 30,716 annotations There were 23,522 annotations that were exact matches between the two. When comparing with the gold standard annotations (4591 annotations): UMLSProcessor found 2632 matches (2,735 including overlaps) FastUMLSProcessor found 2795 matches (2,842 including overlaps) [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen bruce.tiet...@perfectsearchcorp.com wrote: I'll do that -- there is always a possibility of bugs in the analysis tool. [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Sorry, I meant “Do some spot checks on the validity”. In other words, when your script reports that a cui and/or span is missing, manually look at the data and see if it really is. Just open up one .xmi in the CVD and see what it looks like. Thanks, Sean *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] *Sent:* Friday, December 19, 2014 3:37 PM *To:* dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison My original results were using a newly downloaded cTakes 3.2.1 with the separately downloaded resources copied in. There were no changes to any of the configuration files. As far as this last run, I modified the UMLSLookupAnnotator.xml and AggregatePlaintextFastUMLSProcessor.xml. I've attached the modified ones I used (but they may not get through the mailing list). [image: Image removed by sender. IMAT Solutions] http://imatsolutions.com *Bruce Tietjen* Senior Software Engineer [image: Image removed by sender. Mobile:]801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Bruce, I'm not sure how there would be fewer matches with the overlap processor. There should be all of the matches from the non-overlap processor plus those from the overlap. Decreasing from 215 to 211 is strange. Have you done any manual spot checks on this? It is really bizarre that you'd only have two matches per document (100 docs?). Thanks, Sean -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Friday, December 19, 2014 3:23 PM To: dev@ctakes.apache.org Subject: Re: cTakes Annotation Comparison Sean, I tried the configuration changes you mentioned in your earlier email. The results are as follows: Total Annotations found: 12,161 (default configuration found 8,284) If counting exact span matches, this run only matched 211 (default configuration matched 215). If counting overlapping spans, this run only matched 220 (default configuration matched 224) Bruce [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Kim, Maintenance is the factor not bugs/issue to forge ahead. They are 2 components that do the same thing with the same goal (As Sean mentioned, one should be able configure the new code base to replicate the old algorithm if required- it’s just a simpler and cleaner code base. If this is not the case or if there are issues, we should fix it and move forward.). We can keep the old component around for as long as needed, but it’s likely going to have limited support… --Pei *From:* Kim Ebert [mailto:kim.eb...@imatsolutions.com] *Sent:* Friday, December 19, 2014 1:47 PM *To:* Chen, Pei; dev@ctakes.apache.org *Subject:* Re: cTakes Annotation Comparison Pei, I don't think bugs/issues should be part of determining if one algorithm vs the other is superior. Obviously, it is worth mentioning the bugs, but if the fast lookup method
RE: cTakes Annotation Comparison
Bruce, Thanks for this-- very useful. Perhaps Sean Finan comment more- but it's also probably worth it to compare to an adjudicated human annotated gold standard. --Pei -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Thursday, December 18, 2014 1:45 PM To: dev@ctakes.apache.org Subject: cTakes Annotation Comparison With the recent release of cTakes 3.2.1, we were very interested in checking for any differences in annotations between using the AggregatePlaintextUMLSProcessor pipeline and the AggregatePlanetextFastUMLSProcessor pipeline within this release of cTakes with its associated set of UMLS resources. We chose to use the SHARE 14-a-b Training data that consists of 199 documents (Discharge 61, ECG 54, Echo 42 and Radiology 42) as the basis for the comparison. We decided to share a summary of the results with the development community. Documents Processed: 199 Processing Time: UMLSProcessor 2,439 seconds FastUMLSProcessor1,837 seconds Total Annotations Reported: UMLSProcessor 20,365 annotations FastUMLSProcessor 8,284 annotations Annotation Comparisons: Annotations common to both sets: 3,940 Annotations reported only by the UMLSProcessor: 16,425 Annotations reported only by the FastUMLSProcessor:4,344 If anyone is interested, following was our test procedure: We used the UIMA CPE to process the document set twice, once using the AggregatePlaintextUMLSProcessor pipeline and once using the AggregatePlaintextFastUMLSProcessor pipeline. We used the WriteCAStoFile CAS consumer to write the results to output files. We used a tool we recently developed to analyze and compare the annotations generated by the two pipelines. The tool compares the two outputs for each file and reports any differences in the annotations (MedicationMention, SignSymptomMention, ProcedureMention, AnatomicalSiteMention, and DiseaseDisorderMention) between the two output sets. The tool reports the number of 'matches' and 'misses' between each annotation set. A 'match' is defined as the presence of an identified source text interval with its associated CUI appearing in both annotation sets. A 'miss' is defined as the presence of an identified source text interval and its associated CUI in one annotation set, but no matching identified source text interval and CUI in the other. The tool also reports the total number of annotations (source text intervals with associated CUIs) reported in each annotation set. The compare tool is in our GitHub repository at https://github.com/perfectsearch/cTAKES-compare
Re: cTakes Annotation Comparison
Actually, we are working on a similar tool to compare it to the human adjudicated standard for the set we tested against. I didn't mention it before because the tool isn't complete yet, but initial results for the set (excluding those marked as CUI-less) was as follows: Human adjudicated annotations: 4591 (excluding CUI-less) Annotations found matching the human adjudicated standard UMLSProcessor 2245 FastUMLSProcessor 215 [image: IMAT Solutions] http://imatsolutions.com Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Bruce, Thanks for this-- very useful. Perhaps Sean Finan comment more- but it's also probably worth it to compare to an adjudicated human annotated gold standard. --Pei -Original Message- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] Sent: Thursday, December 18, 2014 1:45 PM To: dev@ctakes.apache.org Subject: cTakes Annotation Comparison With the recent release of cTakes 3.2.1, we were very interested in checking for any differences in annotations between using the AggregatePlaintextUMLSProcessor pipeline and the AggregatePlanetextFastUMLSProcessor pipeline within this release of cTakes with its associated set of UMLS resources. We chose to use the SHARE 14-a-b Training data that consists of 199 documents (Discharge 61, ECG 54, Echo 42 and Radiology 42) as the basis for the comparison. We decided to share a summary of the results with the development community. Documents Processed: 199 Processing Time: UMLSProcessor 2,439 seconds FastUMLSProcessor1,837 seconds Total Annotations Reported: UMLSProcessor 20,365 annotations FastUMLSProcessor 8,284 annotations Annotation Comparisons: Annotations common to both sets: 3,940 Annotations reported only by the UMLSProcessor: 16,425 Annotations reported only by the FastUMLSProcessor:4,344 If anyone is interested, following was our test procedure: We used the UIMA CPE to process the document set twice, once using the AggregatePlaintextUMLSProcessor pipeline and once using the AggregatePlaintextFastUMLSProcessor pipeline. We used the WriteCAStoFile CAS consumer to write the results to output files. We used a tool we recently developed to analyze and compare the annotations generated by the two pipelines. The tool compares the two outputs for each file and reports any differences in the annotations (MedicationMention, SignSymptomMention, ProcedureMention, AnatomicalSiteMention, and DiseaseDisorderMention) between the two output sets. The tool reports the number of 'matches' and 'misses' between each annotation set. A 'match' is defined as the presence of an identified source text interval with its associated CUI appearing in both annotation sets. A 'miss' is defined as the presence of an identified source text interval and its associated CUI in one annotation set, but no matching identified source text interval and CUI in the other. The tool also reports the total number of annotations (source text intervals with associated CUIs) reported in each annotation set. The compare tool is in our GitHub repository at https://github.com/perfectsearch/cTAKES-compare