> DictionaryLookupAnnotator which is a container for the dictionaries and it > iterates through the list of lookup dictionaries
I am confused. The new dictionary-lookup-fast has neither this class nor multiple dictionaries. The umls and rxnorm are in the same database table and lookup is performed in one swoop. Could you please send a copy of your pipeline xmls to me directly (instead of bombing the group) with something other than an .xml extension (they get blocked)? ________________________________ From: Bruce Tietjen [bruce.tiet...@perfectsearchcorp.com] Sent: Thursday, October 09, 2014 11:41 AM To: dev@ctakes.apache.org Subject: Re: Differences in MedicationMention annotations on subsequent processing runs I tried the Dictionary-lookup-fast module and the bahavior is the same. I did have to run it a number of times before timing was right to reproduce the issue. With the older lookup, chances were about 50/50 between which dictionary ran first. Using the dictionary-fast, it seems more like 70/30 with the standard umls lookup being more likely to run first than not. Which means that most of the time, there is no MedicationMention annotation for Bacitracin. (See Attached) The code with the issue is the DictionaryLookupAnnotator which is a container for the dictionaries and it iterates through the list of lookup dictionaries so that part of the code path does not seem to have changed. In the past, the rxNorm dictionary was a Lucene search and so I'm guessing it behaved a little differently than it does now with both being JDBC. The fact that the filter is at this location seems to indicate that it may have been by intended for it to be across all dictionaries. On the other hand, it appears to mask out the lookups for the different dictionaries, resulting in some annotations not being made. So, the real question is how should the filter work -- should the annotation filtering be per lookup dictionary, or be across all dictionaries? Or is there something wrong elsewhere that causes I lean towards having the filter function per dictionary. This may risk having duplicate annotations, but that would probably be better than missing the annotation all together. [IMAT Solutions]<http://imatsolutions.com> Bruce Tietjen Senior Software Engineer [Mobile:] 801.634.1547 bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com> On Wed, Oct 8, 2014 at 10:02 AM, Finan, Sean <sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>> wrote: Hi Bruce, With Pei's help I just updated the sourceforge repo with the cTakes dictionaries. Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab Sean -----Original Message----- From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com<mailto:bruce.tiet...@perfectsearchcorp.com>] Sent: Wednesday, October 08, 2014 11:52 AM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: Re: Differences in MedicationMention annotations on subsequent processing runs If I understand correctly, I would need new dictionary resources to run the rare word lookup method. Where can I find the necessary dictionary(ies) or how do I build them? [image: IMAT Solutions] <http://imatsolutions.com> Bruce Tietjen Senior Software Engineer [image: Mobile:] 801.634.1547<tel:801.634.1547> bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com> On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean < sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>> wrote: > Hi Bruce, > > I would venture to say that this is neither expected nor desired. > > > > Before you fix it (or in addition to a fix), try to run with the new > dictionary lookup. It will have a different behavior, and it will be the > default dictionary lookup in future releases of cTakes – making fixes to > the current module slightly less urgent. > > > > Sean > > > > *From:* Bruce Tietjen > [mailto:bruce.tiet...@perfectsearchcorp.com<mailto:bruce.tiet...@perfectsearchcorp.com>] > *Sent:* Wednesday, October 08, 2014 11:38 AM > *To:* dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> > *Subject:* Differences in MedicationMention annotations on subsequent > processing runs > > > > > > I have encountered a situation in which the cTakes clinical pipeline > output differs between multiple runs on the same text with the same > configuration. > > The following snippets from a single document are sufficient to > demonstrate the issue: > > a gentle curve going into. irrigated with Bacitracin. > > > > The source of the difference is that the DictionaryLookupAnnotator uses a > map to filter out duplicate annotations for a single document location: > > // used to prevent duplicate hits > // key = hit begin,end key (java.lang.String) > // val = Set of MetaDataHit objects > private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>(); > > This map is shared between both the umls_ms_2011ab lookup and the > umls_ms_2011an_rxnorm lookup, > > > > If both dictionaries contain the same term, the order of dictionary lookup > execution determines the output.If the rxnorm lookup runs first, then a > MedicationMention annotation for Bacitracin appears in the final output. If > the standard umls lookup runs first, then there is no MedicationMention > annotation for Bacitracin. > > I will attach the output from the subsequent runs. (Hopefully the > attachment will make it through the system) > > > > Is this expected behavior? If not, what would be the expected behavior? > > > > [image: Image removed by sender. IMAT Solutions] > <http://imatsolutions.com> > > *Bruce Tietjen* > Senior Software Engineer > [image: Image removed by sender. Mobile:]801.634.1547<tel:801.634.1547> > bruce.tiet...@imatsolutions.com<mailto:bruce.tiet...@imatsolutions.com> >