Peter, I have experienced similar issues with how text spans translate to different CUIs depending on the included vocabularies as well. I had a similar conversation with Sean on the dev forum last year I believe.
I do not believe the behavior of 'wbc' has changed- if I run the clinical pipeline with sno_rx_16ab dictionary, it is tagged as an AnatomicalSiteMention. Are you seeing something different? Jeff On Wed, Aug 5, 2020 at 11:24 PM Peter Abramowitsch <pabramowit...@gmail.com> wrote: > Hi Jeff > > I thought I did load them all, but I'll go back and check. > > When looking at my gene issue the result is that the lookup arbitrarily > (seemingly anyway) flips between one and another when there are overlaps > between vocabularies. Ie. I see that both Vocab A & B both contain geneX > and geneY. Neither of these are in SNOMED. So in my output, I get one of > the genes associated with Vocab A and another with Vocab B. When I remove > Vocab B then obviously both are associated with Vocab A - which is what I > wanted. > > If, for you, WBC is showing up as an anatomical location, rather than a > T059 then probably it's not getting the correct SNOMED code though. > Wouldn't that be a problem for your researchers? > > Peter > > On Wed, Aug 5, 2020 at 5:37 PM Jeffrey Miller <jeff...@gmail.com> wrote: > > > Hi Peter, > > > > If I create a dictionary using UMLS 2020aa with just snomed and rxnorm my > > cTAKES dictionary still seems to have a CUI associated with the string > > 'wbc' that links to the snomed term for Leukocyte (Cell). It is not > mapping > > to a lab result TUI, but rather an anatomical site, but it seems to be > the > > same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is conflicting > > with that too? > > > > Just to double check, when you installed UMLS through Metamorphosys, did > > you install all of the available vocabularies? > > > > Jeff > > > > On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch < > pabramowit...@gmail.com > > > > > wrote: > > > > > Hi All > > > > > > I've been setting up a custom dictionary using UMLS with the goal of > > simply > > > adding a comprehensive genetic vocabulary HGNC to the latest UMLS > SNOMED > > > and RXNORM vocabularies in the hope of getting somewhere close to the > > > cTakes default dictionary again. > > > > > > However, there are changes to concept vocabularies in UMLS2020AA that > > > affect the ability of cTakes to work well with older notes and possibly > > the > > > note-writing practices of older physicians and labs. Some of the > tried > > > and true acronyms such as WBC for leukocytes, RBC, and EOS (eosinophil > > > count) are no longer part of SNOMED. Probably this is because the > > > components of these parameters are now broken out into more granular > > > types. The other reason this may be is that a few of these acronyms > now > > > overlap the names of Genes. EOS is one of them. This is just > > speculation. > > > > > > In order to have these common parameters re-included via their common > lab > > > acronyms, it is necessary to add another common US vocabulary such as > > > HL7-V3.0 or NCI_CDISC. Of course one can remap back into SNOMED by > > adding > > > insert statements into the dictionary script, but it might be a > > > non-scalable exercise. > > > > > > So my point here is that if, one day, we plan to create a new cTakes > > > release, and with it, a new UMLS lookup, we may need to consider > adding a > > > third basic vocabulary into our current set of two. > > > > > > Thoughts? > > > Peter > > > > > >