Differences in MedicationMention annotations on subsequent processing runs

2014-10-08 Thread Bruce Tietjen
I have encountered a situation in which the cTakes clinical pipeline output
differs between multiple runs on the same text with the same configuration.

The following snippets from a single document are sufficient to demonstrate
the issue:

 a gentle curve going into. irrigated with Bacitracin.


The source of the difference is that the DictionaryLookupAnnotator uses a
map to filter out duplicate annotations for a single document location:

// used to prevent duplicate hits
// key = hit begin,end key (java.lang.String)
// val = Set of MetaDataHit objects
private Map> iv_dupMap = new HashMap<>();


This map is shared between both the umls_ms_2011ab lookup and the
umls_ms_2011an_rxnorm lookup,

If both dictionaries contain the same term, the order of dictionary lookup
execution determines the output.If the rxnorm lookup runs first, then a
MedicationMention annotation for Bacitracin appears in the final output. If
the standard umls lookup runs first, then there is no MedicationMention
annotation for Bacitracin.

I will attach the output from the subsequent runs. (Hopefully the
attachment will make it through the system)

Is this expected behavior? If not, what would be the expected behavior?

 [image: IMAT Solutions] 
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com






















































259




207
216























































207
216




RE: Differences in MedicationMention annotations on subsequent processing runs

2014-10-08 Thread Finan, Sean
Hi Bruce,
I would venture to say that this is neither expected nor desired.

Before you fix it (or in addition to a fix), try to run with the new dictionary 
lookup.   It will have a different behavior, and it will be the default 
dictionary lookup in future releases of cTakes – making fixes to the current 
module slightly less urgent.

Sean

From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
Sent: Wednesday, October 08, 2014 11:38 AM
To: dev@ctakes.apache.org
Subject: Differences in MedicationMention annotations on subsequent processing 
runs


I have encountered a situation in which the cTakes clinical pipeline output 
differs between multiple runs on the same text with the same configuration.
The following snippets from a single document are sufficient to demonstrate the 
issue:

 a gentle curve going into. irrigated with Bacitracin.

The source of the difference is that the DictionaryLookupAnnotator uses a map 
to filter out duplicate annotations for a single document location:
// used to prevent duplicate hits
// key = hit begin,end key (java.lang.String)
// val = Set of MetaDataHit objects
private Map> iv_dupMap = new HashMap<>();

This map is shared between both the umls_ms_2011ab lookup and the 
umls_ms_2011an_rxnorm lookup,

If both dictionaries contain the same term, the order of dictionary lookup 
execution determines the output.If the rxnorm lookup runs first, then a 
MedicationMention annotation for Bacitracin appears in the final output. If the 
standard umls lookup runs first, then there is no MedicationMention annotation 
for Bacitracin.
I will attach the output from the subsequent runs. (Hopefully the attachment 
will make it through the system)

Is this expected behavior? If not, what would be the expected behavior?

[Image removed by sender. IMAT Solutions]
Bruce Tietjen
Senior Software Engineer
[Image removed by sender. Mobile:]801.634.1547
bruce.tiet...@imatsolutions.com


Re: Differences in MedicationMention annotations on subsequent processing runs

2014-10-08 Thread Bruce Tietjen
If I understand correctly, I would need new dictionary resources to run the
rare word lookup method.

Where can I find the necessary dictionary(ies) or how do I build them?


 [image: IMAT Solutions] 
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

>  Hi Bruce,
>
> I would venture to say that this is neither expected nor desired.
>
>
>
> Before you fix it (or in addition to a fix), try to run with the new
> dictionary lookup.   It will have a different behavior, and it will be the
> default dictionary lookup in future releases of cTakes – making fixes to
> the current module slightly less urgent.
>
>
>
> Sean
>
>
>
> *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
> *Sent:* Wednesday, October 08, 2014 11:38 AM
> *To:* dev@ctakes.apache.org
> *Subject:* Differences in MedicationMention annotations on subsequent
> processing runs
>
>
>
>
>
> I have encountered a situation in which the cTakes clinical pipeline
> output differs between multiple runs on the same text with the same
> configuration.
>
> The following snippets from a single document are sufficient to
> demonstrate the issue:
>
>  a gentle curve going into. irrigated with Bacitracin.
>
>
>
> The source of the difference is that the DictionaryLookupAnnotator uses a
> map to filter out duplicate annotations for a single document location:
>
> // used to prevent duplicate hits
> // key = hit begin,end key (java.lang.String)
> // val = Set of MetaDataHit objects
> private Map> iv_dupMap = new HashMap<>();
>
>  This map is shared between both the umls_ms_2011ab lookup and the
> umls_ms_2011an_rxnorm lookup,
>
>
>
> If both dictionaries contain the same term, the order of dictionary lookup
> execution determines the output.If the rxnorm lookup runs first, then a
> MedicationMention annotation for Bacitracin appears in the final output. If
> the standard umls lookup runs first, then there is no MedicationMention
> annotation for Bacitracin.
>
> I will attach the output from the subsequent runs. (Hopefully the
> attachment will make it through the system)
>
>
>
> Is this expected behavior? If not, what would be the expected behavior?
>
>
>
> [image: Image removed by sender. IMAT Solutions]
> 
>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547
> bruce.tiet...@imatsolutions.com
>


RE: Differences in MedicationMention annotations on subsequent processing runs

2014-10-08 Thread Finan, Sean
Good point ...
I tried to check in to sourceforge but had problems.  I will try again right 
now ...

Building a custom dictionary is possible with the DictionaryTool in cTakes 
sandbox, but that is a different rabbit hole.

-Original Message-
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Wednesday, October 08, 2014 11:52 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in MedicationMention annotations on subsequent 
processing runs

If I understand correctly, I would need new dictionary resources to run the 
rare word lookup method.

Where can I find the necessary dictionary(ies) or how do I build them?


 [image: IMAT Solutions]   Bruce Tietjen Senior 
Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean < sean.fi...@childrens.harvard.edu> 
wrote:

>  Hi Bruce,
>
> I would venture to say that this is neither expected nor desired.
>
>
>
> Before you fix it (or in addition to a fix), try to run with the new
> dictionary lookup.   It will have a different behavior, and it will be the
> default dictionary lookup in future releases of cTakes – making fixes 
> to the current module slightly less urgent.
>
>
>
> Sean
>
>
>
> *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
> *Sent:* Wednesday, October 08, 2014 11:38 AM
> *To:* dev@ctakes.apache.org
> *Subject:* Differences in MedicationMention annotations on subsequent 
> processing runs
>
>
>
>
>
> I have encountered a situation in which the cTakes clinical pipeline 
> output differs between multiple runs on the same text with the same 
> configuration.
>
> The following snippets from a single document are sufficient to 
> demonstrate the issue:
>
>  a gentle curve going into. irrigated with Bacitracin.
>
>
>
> The source of the difference is that the DictionaryLookupAnnotator 
> uses a map to filter out duplicate annotations for a single document location:
>
> // used to prevent duplicate hits
> // key = hit begin,end key (java.lang.String)
> // val = Set of MetaDataHit objects
> private Map> iv_dupMap = new HashMap<>();
>
>  This map is shared between both the umls_ms_2011ab lookup and the 
> umls_ms_2011an_rxnorm lookup,
>
>
>
> If both dictionaries contain the same term, the order of dictionary 
> lookup execution determines the output.If the rxnorm lookup runs 
> first, then a MedicationMention annotation for Bacitracin appears in 
> the final output. If the standard umls lookup runs first, then there 
> is no MedicationMention annotation for Bacitracin.
>
> I will attach the output from the subsequent runs. (Hopefully the 
> attachment will make it through the system)
>
>
>
> Is this expected behavior? If not, what would be the expected behavior?
>
>
>
> [image: Image removed by sender. IMAT Solutions] 
> 
>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547 
> bruce.tiet...@imatsolutions.com
>


RE: Differences in MedicationMention annotations on subsequent processing runs

2014-10-08 Thread Finan, Sean
Hi Bruce,

With Pei's help I just updated the sourceforge repo with the cTakes 
dictionaries.  Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab

Sean

-Original Message-
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Wednesday, October 08, 2014 11:52 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in MedicationMention annotations on subsequent 
processing runs

If I understand correctly, I would need new dictionary resources to run the
rare word lookup method.

Where can I find the necessary dictionary(ies) or how do I build them?


 [image: IMAT Solutions] 
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

>  Hi Bruce,
>
> I would venture to say that this is neither expected nor desired.
>
>
>
> Before you fix it (or in addition to a fix), try to run with the new
> dictionary lookup.   It will have a different behavior, and it will be the
> default dictionary lookup in future releases of cTakes – making fixes to
> the current module slightly less urgent.
>
>
>
> Sean
>
>
>
> *From:* Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
> *Sent:* Wednesday, October 08, 2014 11:38 AM
> *To:* dev@ctakes.apache.org
> *Subject:* Differences in MedicationMention annotations on subsequent
> processing runs
>
>
>
>
>
> I have encountered a situation in which the cTakes clinical pipeline
> output differs between multiple runs on the same text with the same
> configuration.
>
> The following snippets from a single document are sufficient to
> demonstrate the issue:
>
>  a gentle curve going into. irrigated with Bacitracin.
>
>
>
> The source of the difference is that the DictionaryLookupAnnotator uses a
> map to filter out duplicate annotations for a single document location:
>
> // used to prevent duplicate hits
> // key = hit begin,end key (java.lang.String)
> // val = Set of MetaDataHit objects
> private Map> iv_dupMap = new HashMap<>();
>
>  This map is shared between both the umls_ms_2011ab lookup and the
> umls_ms_2011an_rxnorm lookup,
>
>
>
> If both dictionaries contain the same term, the order of dictionary lookup
> execution determines the output.If the rxnorm lookup runs first, then a
> MedicationMention annotation for Bacitracin appears in the final output. If
> the standard umls lookup runs first, then there is no MedicationMention
> annotation for Bacitracin.
>
> I will attach the output from the subsequent runs. (Hopefully the
> attachment will make it through the system)
>
>
>
> Is this expected behavior? If not, what would be the expected behavior?
>
>
>
> [image: Image removed by sender. IMAT Solutions]
> 
>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547
> bruce.tiet...@imatsolutions.com
>