RE: Differences in MedicationMention annotations on subsequent processing runs

Finan, Sean Thu, 09 Oct 2014 10:20:41 -0700

> DictionaryLookupAnnotator which is a container for the dictionaries and it 
> iterates through the list of lookup dictionaries

I am confused.  The new dictionary-lookup-fast has neither this class nor 
multiple dictionaries.  The umls and rxnorm are in the same database table and 
lookup is performed in one swoop.  Could you please send a copy of your 
pipeline xmls to me directly (instead of bombing the group) with something 
other than an .xml extension (they get blocked)?

________________________________
From: Bruce Tietjen [bruce.tiet...@perfectsearchcorp.com]
Sent: Thursday, October 09, 2014 11:41 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in MedicationMention annotations on subsequent 
processing runs

I tried the Dictionary-lookup-fast module and the bahavior is the same. I did 
have to run it a number of times before timing was right to reproduce the 
issue. With the older lookup, chances were about 50/50 between which dictionary 
ran first. Using the dictionary-fast, it seems more like 70/30 with the 
standard umls lookup being more likely to run first than not. Which means that 
most of the time, there is no MedicationMention annotation for Bacitracin.  
(See Attached)

The code with the issue is the DictionaryLookupAnnotator which is a container 
for the dictionaries and it iterates through the list of lookup dictionaries so 
that part of the code path does not seem to have changed.

In the past, the rxNorm dictionary was a Lucene search and so I'm guessing it 
behaved a little differently than it does now with both being JDBC.

The fact that the filter is at this location seems to indicate that it may have 
been by intended for it to be across all dictionaries. On the other hand, it 
appears to mask out the lookups for the different dictionaries, resulting in 
some annotations not being made.

So, the real question is how should the filter work -- should the annotation 
filtering be per lookup dictionary, or be across all dictionaries? Or is there 
something wrong elsewhere that causes

I lean towards having the filter function per dictionary. This may risk having 
duplicate annotations, but that would probably be better than missing the 
annotation all together.

[IMAT Solutions]<http://imatsolutions.com>
Bruce Tietjen
Senior Software Engineer
[Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com>

On Wed, Oct 8, 2014 at 10:02 AM, Finan, Sean 
<sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>> 
wrote:
Hi Bruce,

With Pei's help I just updated the sourceforge repo with the cTakes 
dictionaries.  Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab

Sean

-----Original Message-----
From: Bruce Tietjen 
[mailto:bruce.tiet...@perfectsearchcorp.com<mailto:bruce.tiet...@perfectsearchcorp.com>]
Sent: Wednesday, October 08, 2014 11:52 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: Differences in MedicationMention annotations on subsequent 
processing runs

If I understand correctly, I would need new dictionary resources to run the
rare word lookup method.

Where can I find the necessary dictionary(ies) or how do I build them?

 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547<tel:801.634.1547>
bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com>

On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>> 
wrote:

>  Hi Bruce,
>
> I would venture to say that this is neither expected nor desired.
>
>
>
> Before you fix it (or in addition to a fix), try to run with the new
> dictionary lookup.   It will have a different behavior, and it will be the
> default dictionary lookup in future releases of cTakes – making fixes to
> the current module slightly less urgent.
>
>
>
> Sean
>
>
>
> *From:* Bruce Tietjen 
> [mailto:bruce.tiet...@perfectsearchcorp.com<mailto:bruce.tiet...@perfectsearchcorp.com>]
> *Sent:* Wednesday, October 08, 2014 11:38 AM
> *To:* dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> *Subject:* Differences in MedicationMention annotations on subsequent
> processing runs
>
>
>
>
>
> I have encountered a situation in which the cTakes clinical pipeline
> output differs between multiple runs on the same text with the same
> configuration.
>
> The following snippets from a single document are sufficient to
> demonstrate the issue:
>
>  a gentle curve going into. irrigated with Bacitracin.
>
>
>
> The source of the difference is that the DictionaryLookupAnnotator uses a
> map to filter out duplicate annotations for a single document location:
>
>     // used to prevent duplicate hits
>     // key = hit begin,end key (java.lang.String)
>     // val = Set of MetaDataHit objects
>     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
>
>  This map is shared between both the umls_ms_2011ab lookup and the
> umls_ms_2011an_rxnorm lookup,
>
>
>
> If both dictionaries contain the same term, the order of dictionary lookup
> execution determines the output.If the rxnorm lookup runs first, then a
> MedicationMention annotation for Bacitracin appears in the final output. If
> the standard umls lookup runs first, then there is no MedicationMention
> annotation for Bacitracin.
>
> I will attach the output from the subsequent runs. (Hopefully the
> attachment will make it through the system)
>
>
>
> Is this expected behavior? If not, what would be the expected behavior?
>
>
>
> [image: Image removed by sender. IMAT Solutions]
> <http://imatsolutions.com>
>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547<tel:801.634.1547>
> bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com>
>

RE: Differences in MedicationMention annotations on subsequent processing runs

Reply via email to